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PREFACE 


This  symposium  was  proposed  and  sponsored  by  the  Institute  for  Defense 
Analyses  (IDA)  to  use  the  talents  of  the  alumni  of  the  Defense  Science  Study  Group 
(DSSG)  to  address  important  issues  related  to  defense.  The  DSSG  is  a  program  of 
education  and  study  for  a  select  group  of  outstanding  young  professors  of  science  and 
engineering.  The  purpose  of  the  program  is  to  build  a  bridge  of  understanding  between 
these  professors  and  the  defense  community  so  that  they  will  be  better  prepared  to 
conduct  defense  related  research  and  to  serve  as  advisors,  consultants,  or  members  of 
study  groups  and  panels  of  the  DoD  and  other  components  of  the  U.S.  Government. 

The  symposium  was  held  at  IDA  from  October  31  through  November  2,  1994. 
The  theme  was  Applications  of  Advanced  and  Innovative  Computational  Methods  to 
Defense  Science  and  Engineering.  The  purpose  was  to  use  the  alumni  of  the  DSSG 
program  and  several  other  key  academics  as  a  core  group  to  discuss  advanced  and 
innovative  ways  that  computers  are  used  in  academic  research.  To  complement  the 
academic  briefings,  a  group  of  the  DoD  and  DOE  scientists  presented  briefings  on  then- 
defense  related  research.  Discussions  centered  around  the  important  work,  especially  in 
the  area  of  massively  parallel  processing,  that  is  being  done  by  the  two  groups  of 
researchers. 

The  theme  for  the  symposium  was  proposed  by  Dr.  Anita  K.  Jones,  the  Director, 
Defense  Research  and  Engineering.  Her  purpose  was  to  help  ensure  that  the  DoD  was 
benefiting  to  the  fullest  extent  possible  from  advanced  computational  methods  being  used 
in  academic  research.  Those  persons  responsible  for  arranging  the  symposium  are 
especially  indebted  to  Dr.  Jones  for  encouragement  and  support,  and  for  setting  the  tone 
of  the  symposium  with  her  keynote  address.  They  are  also  indebted  to  Professor  Steven 
E.  Koonin  who  served  so  effectively  as  the  moderator  of  the  symposium,  to  Dr.  Russell 
Herndon  who  helped  identify  key  speakers  from  the  DoD,  and  to  Dr.  Maile  E.  Smith  and 
Dr.  Norman  R.  Howes  of  IDA  who  prepared  this  report. 
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DISCUSSION 


A.  INTRODUCTION 

The  technologies  involved  in  parallel  computing  have  made  tremendous  strides  in 
the  last  25  years.  During  the  late  1960s,  the  first  commercial  multiple  processor  computers 
that  were  produced  incorporated  only  a  few  processors.  Today,  massively  parallel 
machines  have  been  produced  incorporating  over  1,000  processors,  although  typical 
configurations  tend  to  be  tens  of  processors  for  shared  memory  machines  and  as  many  as  a 
few  hundred  processors  for  distributed  memory  machines.  Networks  of  work  stations, 
with  hundreds  of  nodes,  have  been  used  to  solve  large  problems  in  science  and 
engineering.  The  history  of  high-performance  computing  (HPC)  shows  that  the 
Department  of  Defense  (DoD)  has  played  an  active  role  in  fimding  the  development  of 
many  advances  in  computing.  During  the  last  decade,  the  Advanced  Research  Projects 
Agency  (ARP A)  has  provided  extensive  funding  for  parallel  architecture  development, 
culminating  in  the  U.S.  Government's  High  Performance  Computing  Initiative. 

Currently,  parallel  computer  techniques  are  used  by  researchers  to  solve  many 
problems  in  science  and  engineering.  The  fact  that  these  techniques  are  widely  used  is  a 
testament  to  their  successful  development.  The  scientific  community  has  embraced  parallel 
computing  to  allow  researchers  to  solve  problems  that  only  a  decade  ago  appeared 
intractable.  Industry,  with  a  few  notable  exceptions,  has  been  slow  to  adopt  parallel 
processing  to  aid  in  the  manufacturing  process  because  of  the  unavailability  of  software  and 
the  expense  of  large  parallel  machines.  The  main  use  of  parallel  processing  in  industry 
today  is  in  the  information  industry  for  database  servers,  where  the  software  is 
"transparent"  to  the  user.  The  entertainment  industry  is  becoming  a  large  user  of  HPC,  but 
the  extent  to  which  they  are  using  parallel  computing  is  not  clear.  What  is  clear,  however, 
is  that  the  DoD  was  instrumental  in  funding  the  development  of  parallel  computers,  but  the 
increase  in  their  use  among  non-DoD  users  has  resulted  in  a  market  where  the  DoD  is  not 
the  major  customer.  Thus,  the  DoD  may  find  it  practical  to  work  even  more  closely  with 
academia,  the  information  industry,  and  the  entertainment  industry  to  ensure  that  it 
continues  to  be  a  major  player  in  this  important  technical  area. 
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The  principal  topic  of  discussion  during  the  3-day  symposium  was  parallel 
processing,  as  done  on  vector  machines,  massively  parallel  machines,  or  networks  of  work 
stations.  The  broader  topics  of  HPC,  the  Information  Superhighway,  and  development  of 
techniques  to  manage  large  databases  were  only  discussed  tangentially.  Presentations  were 
given  by  many  academic  and  government  researchers,  either  by  those  who  use  parallel 
computing  machines  in  their  work  or  by  those  who  are  working  in  the  field  of  computer 
science  to  develop  new  technologies  and  computer  architectures. 

Many  of  the  presentations  were  in  the  area  of  parallel  computing:  where  it  is  today, 
where  it  is  going,  and  what  it  can  do.  Others  presented  examples  of  applications  of  parallel 
computing  to  a  wide  number  of  problems  in  science  and  engineering. 

The  text  of  this  paper  attempts  to  summarize  the  presentations  and  to  capture  in 
short  form  the  general  and  specific  highlights  of  the  discussions  held  during  the 
symposium.  Appendix  A  shows  the  agenda  and  Appendix  B  lists  the  names  and  addresses 
of  the  attendees.  Finally,  Appendix  C  presents  copies  of  all  of  the  briefing  aids  used  by  the 
various  speakers. 

B .  SUMMARY  OF  THE  KEYNOTE  ADDRESS  AND  OVERVIEW 
PRESENTATIONS 

In  the  Keynote  Address  delivered  by  Dr.  Anita  Jones,  the  Director,  Defense 
Research  and  Engineering,  she  noted  that  the  DoD  is  currently  funding  research  in  many 
"Computational  Technology  Areas."  These  areas,  along  with  the  number  of  DoD-funded 
researchers  involved  are  shown  in  Table  1.  To  support  this  research,  the  DoD  has 
established  10  computational  centers  in  the  United  States,  each  run  by  one  of  the  military 
services.  The  hardware  at  each  of  these  centers  is  impressive,  as  shown  in  Table  2.  The 
DoD  is  also  establishing  a  network  to  allow  access  to  its  machines  and  is  developing 
software  to  support  remote  computing.  The  real  challenge  to  using  these  machines,  as 
noted  by  Dr.  Jones,  lies  in  the  ability  to  produce  software  tools  and  to  create  an 
infrastructure  to  support  the  user.  She  identified  five  problems  that  must  be  overcome 
before  parallel  computing  becomes  a  widely  accepted  tool  among  DoD  users.  These  are: 

•  Simplify  remote  use  of  computational  resources 

•  Exploit  high-speed,  reliable  communications 

•  Harness  development  from  outside  the  DoD 

•  Build  scalable,  error  free  software 

•  Support  users  with  expert  consultation. 


# 
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Table  1.  Computational  Technology  Areas, 
Research  Groups,  and  Researchers 


Computational  Technology  Area 

No. 

Research 

Groups 

No. 

DoD 

Researchers 

Computational  Structural  Mechanics 

67 

1,466 

Computational  Fluid  Dynamics 

94 

2,071 

Computational  Chemistry  and  Materials  Science 

47 

723 

Computational  Electromagnetics  and  Acoustics 

45 

828 

ClimateA/Veather/Ocean  Modeling 

17 

450 

Signal/Image  Processing 

37 

Forces  Modeling  and  Simulation/C4l 

28 

895 

Environmental  Quality  Modeling  and  Simulation 

377 

Computational  Electronics  and  Nanoelectronics 

9 

79 

Table  2.  Current  Computational  Capabilities  in 
the  DoD 


Major  Shared  Resource  Centers 

• 

Army  Research  (Cray  2  &  KSR) 

• 

Air  Force  Dayton  (Paragon) 

• 

Army  Vbksburg  (C-90  &  Y-MP) 

• 

Naval  Ocean  (C-90  &  Y-MP) 

Distributed  Centers 

• 

• 

• 

• 

• 

• 

Naval  R&D  Center  (Convex  SPP  &  Paragon) 
Naval  Research  Laboratory  (CM-5) 

Air  Force  Rome  Laboratory  ^BD) 

Air  Force  Eglin  AFB  (Cray  T3D) 

Air  Force  Maui  (IBM  SP-2) 

Army  HPC  Research  Center  (CM-5) 

The  meeting  continued  with  a  presentation  on  Grand  Challenges  by  Dr.  Andrew 
White  of  the  Los  Alamos  National  Laboratory.  The  Grand  Challenges  were  identified  as 
problems  in  science  and  engineering  that  are  not  tractable  using  current  technologies,  but 
may  be  penetrable  with  future  computing  technologies.  To  augment  the  Grand  Challenges, 
a  list  of  National  Challenges  has  been  created  that  includes  problems  whose  solutions 
would  improve  the  quality  of  life  in  America.  Typical  Grand  Challenges  and  National 
Challenges  are  shown  in  Table  3.  Dr.  White  discussed  how  the  view  of  what  constitutes  a 
Grand  Challenge  has  evolved  since  the  inception  of  the  term  several  years  ago.  Initially,  aU 
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of  the  Grand  Challenges  were  scientific  in  nature.  Now,  the  emphasis  is  shifting  toward 
problems  that  are  stiU  scientific  in  nature,  but  whose  solution  would  have  a  marked  benefit 
to  society. 


Table  3.  Grand  and  National  Challenges 


Typical  Grand  Challenges 

•  Weather,  climate,  and  global  change 

•  Design  of  drugs 

•  Enhanced  oil  and  gas  recovery 

•  Semiconductor  research 

•  Superconductivity  research 

•  Transportation 

Typical  National  Challenges 

•  Crisis  and  Emergency  Management 

•  Digital  Libraries 

•  Electronic  Commerce 

•  Health  Care 

•  Energy  Management 


Parallel  computing  has  allowed  researchers  to  make  significant  progress  in  many  of 
the  Challenge  areas,  but  much  remains  to  be  accomplished.  Dr.  White  presented  examples 
of  progress  in  areas  such  as  predicting  the  topography  of  the  ocean  surface,  characterization 
of  novel  materials  using  molecular  dynamics  calculations,  and  understanding  flow  in 
porous  media.  All  three  of  these  examples  involve  research  on  larger  problems  whose 
solutions  would  indeed  benefit  society,  those  being  prediction  of  global  climate  change, 
molecular  (i.e.,  drug)  design,  and  recovery  of  oil  from  underground  porous  rock, 
respectively. 

Professor  Geoffrey  Fox  from  Syracuse  University  presented  a  talk  entitled  Dual- 
Use  Issues  for  High  Performance  Computing  and  Communications  (HPCC)  Defense 
Applications.  In  his  talk,  he  examined  the  roles  played  by  government,  the  manufacturing 
industry,  the  information  industry,  academia,  and  the  entertainment  industry  in  the  broader 
field  of  HPCC.  HPCC  includes  far  more  than  the  use  of  parallel  machines  that  were  the 
main  topic  of  this  meeting.  It  includes  other  types  of  computer  architectures  as  well  as  the 
exploding  field  of  information  storage,  retrieval,  and  exchange. 


Dr.  Fox  suggested  that  the  future  of  HPCC  will  be  driven  by  information  storage, 
retrieval,  and  exchange,  and  thus  the  market  for  computer  architectures  that  support  these 
functions  will  flourish.  The  demand  for  parallel  processing  machines  that  are  used  to  solve 
scientific  problems,  such  as  those  problems  outlined  in  Table  1,  will  only  be  a  small  part  of 
the  overall  market  Thus,  he  believes  that  the  DoD  will  become  heavily  reliant  on  dual-use 
technologies  and  systems  to  support  its  research. 

Dr.  Fox  also  noted  that  the  manufacturing  industry  (e.g.,  airplane  and  automobile 
manufacturers)  has  been  slow  to  adopt  parallel  processing  to  aid  in  the  manufacturing 
process.  This  is  because  of  the  expense  involved  in  purchasing  and  using  these  computers. 
As  noted  by  Dr.  Jones,  there  is  little  software  available  to  make  these  machines  easy  to  use, 
and  thus  the  cost  of  using  them  for  industry  is  unacceptably  high.  Thus,  Dr.  Fox 
suggested  that  the  U.S.  Government  provide  support  to  the  manufacturing  industries  that 
need  HPCC  in  order  to  ensure  U.S.  leadership  in  the  future. 

Professor  William  Dally  of  the  Massachusetts  Institute  of  Technology  outlined 
current  capabilities  and  trends  toward  future  capabilities  in  processors.  His  analysis  of  the 
state  of  processing  indicates  that  bandwidth  is  becoming  the  most  critical  resource  as 
opposed  to  processing  capability.  He  suggested  that  the  ideal  computer  architecture  for 
HPC  would  have  multiple  processors  with  the  illusion  of  a  shared  memory.  In  other 
words,  it  would  present  the  user  with  a  virtual  machine  that  has  a  shared  memory 
architecture,  even  though  the  physical  architecture  might  employ  distributed  memory. 
However,  there  was  not  general  consensus  among  the  symposium  participants  on  the 
suggestion.  Some  suggested  that  the  ideal  architecture  must  be  matched  to  a  particular 
problem;  others  suggested  that  a  balanced  architecture  would  evolve  and  would  have  some 
utility  for  all  problems. 

At  the  end  of  his  talk.  Professor  Dally  posed  a  question  that  stimulated  significant 
discussion.  Given  the  naturally  occurring  rapid  rate  of  growth  in  supercomputing 
capability  (e.g.,  processor  capability  is  doubling  about  every  2.8  years),  is  there  a  payoff  in 
trying  to  accelerate  this  growth  rate?  In  other  words,  are  there  problems  unsolvable  today 
that  are  so  urgent  that  we  should  invest  more  into  developing  even  faster  computers  than 
those  projected  for  the  next  few  years?  If  so,  what  are  these  problems?  In  the  ensuing 
discussion,  very  few  candidates  surfaced.  The  best  related  to  solving  time  urgent  problems 
where  the  user  of  the  data  cannot  wait  very  long  for  a  solution.  For  example,  the  National 
Weather  Service  cannot  wait  for  days  for  its  models  to  run  when  it  produces  its  daily 
weather  forecasts. 


C.  SUMMARY  OF  THE  TECHNICAL  SESSIONS 

At  this  point  in  the  meeting,  the  focus  of  the  talks  shifted  to  looking  at  certain 
specific  applications  of  parallel  computing.  Talks  were  grouped  into  six  sessions,  each  on 
a  particular  technical  area  of  interest  to  the  DoD.  Each  session  is  briefly  summarized 
below. 

1 .  Nuclear  Weapons 

Two  talks  were  presented  during  this  session,  each  on  very  different  subjects.  One 
was  an  overview  of  the  role  HPC  could  play  in  predicting  the  performance  of  the  U.S.'s 
nuclear  weapons  in  the  stockpile  as  well  as  possible  new  designs.  This  role  becomes 
especially  critical  as  the  U.S.  moves  into  an  era  of  limiting  or  prohibiting  underground 
nuclear  tests.  The  second  talk  presented  some  very  detailed  hydrodynamics  calculations 
that  were  used  to  model  the  propagation  of  a  shock  wave  through  rock.  This  type  of 
calculation  was  necessary  to  support  the  U.S.  effort  to  monitor  and  verify  Soviet 
underground  nuclear  tests. 

2 .  Information  Technology/Signal  Processing 

The  talks  in  this  session  were  quite  different  in  nature.  The  first  was  a  high-level 
presentation  that  outlined  many  ARPA  programs  in  information  sciences.  Because  of  the 
vast  amounts  of  information  needed  by  today's  war  fighters,  ARPA's  program  offers  high 
payoff  for  U.S.  battlefield  efforts  in  intelligence,  targeting,  and  communications.  The 
second  presentation  illustrated  the  use  of  massively  parallel  machines  to  support  research  in 
signal  processing,  for  both  space-based  radars  and  for  ground-based  radio  astronomy. 

3.  Simulation  Based  Design 

The  talks  in  this  session  showed  how  computing  is  beginning  to  be  used  to  facilitate 
the  process  of  designing  and  manufacturing  large  systems.  In  the  past,  computing  might 
have  been  used  to  design  individual  parts  of  a  system,  such  as  optimal  wing  structures,  by 
examining  the  flow  of  air  around  a  wing.  Now,  by  creating  a  design  database  of  the 
system,  it  is  becoming  possible  to  alter  one  small  part  of  a  large  system  and  observe  how 
the  entire  system  must  be  adjusted  to  accommodate  that  change.  An  example  was  shown 
from  the  process  of  ship  design  in  which  the  designer  moved  a  structural  support  inside  a 
ship  and  the  new  design  programs  automatically  altered  the  locations  of  the  electrical 
cabling  and  ductwork  immediately,  thus  allowing  a  redesign  to  occur  in  seconds. 
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4 .  Materials/Processing 

In  this  session,  progress  in  using  parallel  processing  to  predict  characteristics  of 
atoms  and  molecules  was  vividly  illustrated.  Researchers  are  now  able  to  examine 
molecules  and  their  reactions,  which  only  a  decade  ago  were  considered  too  large  to  study 
via  the  standard  techniques  of  quantum  chemistry.  Examples  that  are  of  interest  to 
manufacturing,  such  as  reactions  of  molecules  in  plasmas  used  for  surface  etching,  were 
shown.  Other  examples  included  studies  of  new  superconducting  materials. 

5.  Computational  Fluid  Dynamics 

In  this  session,  the  revolution  brought  about  by  parallel  processing  in 
Computational  Ruid  Dynamics  (CR))  was  apparent.  Research  was  presented  in  which 
scientists  are  able  to  predict  properties  of  flows  so  accurately  that  computation  has  now 
become  an  integral  part  of  experiments.  Calculations  are  now  so  accurate  in  some  areas 
that  they  are  used  to  guide  researchers  in  their  design  of  experiments  and  theoretical 
models.  CRD  calculations  were  shown  which  aid  in  designing  spacecraft  thrusters,  thus 
enhancing  the  maneuverability  of  satellites  on  orbit.  Calculations  were  presented  that  are 
allowing  scientists  to  gain  fundamental  understanding  of  the  characteristics  of  turbulent 
flows. 

The  session  ended  with  an  overview  of  CRD  research  and  massively  parallel 
processing  work  sponsored  by  the  Naval  Research  Laboratory.  The  areas  covered  were 
quite  broad  and  of  critical  importance  to  the  Navy.  For  example,  work  is  being  performed 
to  simulate  air  wakes  from  moving  ships,  wakes  from  torpedo  launches,  detonation  fronts 
from  explosions,  and  jet  noise  from  high  speed  civil  transport  planes.  In  all  of  these  areas, 
massively  parallel  processing  is  allowing  the  Navy  to  begin  solving  problems  that  were 
intractable  only  a  decade  ago. 

6.  Automatic  Target  Recognition 

Automatic  Target  Recognition  (ATR)  has  been,  and  will  continue  to  be,  a  large  area 
of  research  for  the  DoD.  Problems  range  from  finding  a  target  in  real  time  on  a  screen  to 
surveying  massive  amounts  of  imagery  data  for  a  particular  target  of  interest.  An  example 
of  an  ongoing  ARPA  target  acquisition  program.  Tier  n  Plus  Unmanned  Air  Vehicle,  was 
discussed.  The  session  concluded  with  a  presentation  on  research  in  ATR  using  the 
ARPA-funded  programming  tool,  Khoros. 


D.  HIGHLIGHTS  OF  DISCUSSIONS 


After  hearing  the  talks,  the  symposium  attendees,  led  by  the  moderator.  Dr.  Steven 
Koonin,  discussed  the  role  and  future  of  parallel  computing  in  DoD  science  and  research. 
There  was  general  consensus  among  the  group  that  computing  is  now  crucial  to  doing 
scientific  research  in  many  areas  of  importance  to  the  DoD.  There  was  also  a  general 
consensus  that  the  United  States  currently  has  a  lead  in  the  technologies  involved  in  parallel 
processing,  but  it  is  not  obvious  that  the  lead  will  be  maintained.  The  attendees  also 
agreed  that  the  expertise  for  the  development  of  new  technologies  to  support  parallel 
computing  comes  from  academia  and  the  commercial  world,  and  not  from  within  the  DoD. 
Thus,  the  group  felt  that  the  DoD,  with  its  declining  budgets,  should  continue  to  reach  out 
to  the  commercial  world  and  to  academia  for  expertise  as  to  what  is  really  possible  with 
these  powerful  machines.  Many  attendees  thought  that  this  is  an  area  in  which  academia 
could  aid  the  DoD  in  finding  the  best  ways  to  implement  the  use  of  parallel  computing  for 
applications  related  to  scientific  and  engineering  research. 

Another  issue  that  was  raised  regarded  the  different  types  of  parallel  machines  being 
used  by  researchers.  On  the  one  hand,  there  are  the  groups  of  work  stations  that  can  be 
networked  together  to  operate  as  a  parallel  machine  by  using  experimental  software  such  as 
PVM,  Linda,  ISIS,  etc.  On  the  other  hand,  there  are  the  massively  parallel,  distributed 
memory  multiprocessor  machines,  such  as  Intel's  Paragon  and  Touchstone  Sigma,  or 
machines  by  Thinking  Machines  or  Connection  Machines.  The  recent  bankruptcy  of 
Thinking  Machines  led  the  group  to  question  the  commercial  viability  of  the  distributed 
memory  multiprocessor  machines,  especially  in  light  of  the  fact  that  there  exists  little 
software  to  support  computing  on  these  machines.  These  machines  are  rather  user- 
unfriendly,  and  therefore  the  group  was  concerned  that  a  large  commercial  market  may 
never  develop  for  such  machines  and  the  DoD  may  be  left  as  the  only  user.  With  such  a 
small  market,  the  DoD  would  then  be  forced  to  bear  the  entire  cost  of  any  research  and 
development  for  more  advanced  machines. 

The  group  agreed  that  new  techniques  of  visualizing  the  results  of  many  of  these 
applications  are  needed.  Because  these  new  computers  allow  researchers  to  examine 
problems  in  greater  dimensions  over  longer  time  scales,  plodding  through  a  pile  of 
computer  output  is  often  inadequate  as  a  way  for  researchers  to  understand  the  data  being 
produced.  New  techniques  for  visualization  have  proven  especially  useful  in  computational 
fluid  dynamics  and  molecular  design.  During  the  symposium  several  speakers  showed 
films  of  fluid  flows  over  a  variety  of  surfaces,  where  the  flows  were  calculated  using  CFD 
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techniques.  Only  in  the  last  10  years,  with  image  processing  terminals  becoming  readily 
available,  has  the  production  of  such  research-enhancing  aids  become  routinely  available  to 
academic  researchers.  The  session  attendees  felt  this  was  an  important  area  and  that  it 
should  not  be  neglected  relative  to  other  areas  of  research. 

There  were  a  number  of  topics  relating  to  parallel  computing  on  which  the  group 
raised  questions,  but  did  not  come  to  any  answer  or  consensus.  These  could  be  viewed  as 
issues  which,  if  resolved,  might  strengthen  the  role  and  position  played  by  parallel 
computing  in  the  DoD  and  United  States  as  a  whole. 

The  first  issue  relates  to  the  Grand  and  National  Challenges.  The  group  believed 
that  significant  progress  might  be  made  earlier  in  some  of  them  than  in  others.  They  also 
believed  that  some  of  the  Challenges  might  benefit  more  from  the  next  generation  of 
computers,  relative  to  others.  In  particular,  they  believed  that  more  powerful  computing 
capability  would  offer  a  significant  improvement  in  the  area  of  climate  change  research. 
Here,  the  large  general  circulation  models  that  are  run  today  are  not  using  small  enough 
grid  steps  over  the  surface  of  the  earth  and  up  through  the  atmosphere  that  are  necessary  to 
produce  results  with  the  desired  degree  of  accuracy.  The  group  suggested  that  the  United 
States  might  wish  to  give  consideration  to  determining  which  of  the  Grand  and/or  National 
Challenges  might  most  benefit  from  advances  in  computer  development,  and  initiating  a 
program  to  demonstrate  progress  in  solving  that  particular  problem. 

The  participants  were  also  mildly  concerned  about  the  recent  emergence  of  the 
Nationtil  Challenges.  Solving  the  National  Challenges  will  require  innovation  in  database 
management,  information  storage  and  retrieval,  and  the  maintenance  and  acquisition  of 
timely  data.  These  problems  are  of  great  interest  in  the  burgeoning  information  industry, 
which  will  channel  its  research  into  those  areas.  On  the  other  hand,  solving  the  Grand 
Challenges  will  require  innovation  in  the  above  areas,  as  well  as  in  developing  more 
powerful  massively  parallel  machines.  Thus,  there  was  a  modest  level  of  concern  that  the 
country  will  focus  on  the  technologies  for  solving  the  National  Challenges  over  those  for 
the  Grand  Challenges;  therefore,  the  research  into  scalable,  massively  parallel  machines 
might  decline. 

The  DoD  must  also  be  able  to  take  advantage  of  the  progress  made  by  the 
information  and  entertainment  industries  as  they  drive  HPCC  technologies.  This  is  an  area 
that  should  not  be  ignored  by  the  DoD.  These  two  industries  are  huge  players  in  driving 
new  technologies  in  high  performance  computing  and  communications.  The  DoD  should 
remain  abreast  of  the  developments  in  this  field,  and  leverage  them  to  suit  its  needs 


whenever  possible.  HPCC  and  all  of  the  information-transfer  capability  that  it  entails  will 
be  critical  for  battle  management  systems  of  the  future.  If  progress  is  to  be  made  in  real¬ 
time  operational  systems  that  use  parallel  computing  (i.e.,  the  battle  management  systems 
of  the  future),  it  might  be  accomplished  in  these  industries. 

Another  issue  relates  to  the  role  played  by  massively  parallel  machines  in  the 
National  Defense.  If  the  commercial  market  for  these  machines  is  going  to  dwindle  over 
the  next  several  years,  the  DoD  must  decide  if  it  needs  to  support  the  industry.  The  group, 
as  a  whole,  does  not  have  access  to  much  of  the  DoD  work  that  uses  these  supercomputers. 
Given  that,  the  DoD  should  assess  its  needs  as  well  as  the  risks  involved  if  the  United 
States  were  to  lose  its  lead  in  the  field,  and  be  prepared  to  take  action,  if  deemed  necessary. 

Along  these  lines,  the  DoD  needs  to  know  the  cost  of  developing  reasonable 
software  for  MPPs.  Currently,  these  machines  do  not  generally  have  stable  operating 
systems,  good  debuggers,  and  utilities  that  allow  programmers  to  distribute  processes 
automatically  to  processors.  Until  the  cost  (and  benefit)  of  this  type  of  software  is 
understood,  it  will  be  difficult  to  predict  the  future  of  this  field. 

Another  issue  that  arose  comes  from  the  fact  that  DoD-funded  research  is  no  longer 
the  only  research  in  parallel  computing.  Although  the  DoD  was  crucial  in  establishing  the 
field,  parallel  computing  development  is  now  being  driven  not  only  by  the  needs  of  the 
DoD,  but  also  by  academic  and  information  industry  workers.  The  DoD  must  decide  what 
role,  if  any,  it  will  play  in  the  larger  U.S.  Government  effort  to  set  standards  and  provide 
the  infrastructure  for  the  world  of  parallel  computing  and  simulation. 

The  session  concluded  with  a  suggestion  that  the  DoD  might  wish  to  consider 
expanding  its  partnerships  with  academia  in  the  area  of  parallel  processing.  The  large  cost 
associated  with  research  in  parallel  computing  makes  the  manufacturing  industry  hesitant  to 
undertake  new  ventures.  The  DoD,  on  the  other  hand,  has  a  long-term  view  of  research  in 
order  to  receive  a  payoff.  Due  to  cutbacks  in  industrial  research,  academia  is  now  the 
major  institution  performing  fundamental  research  in  America.  Thus,  the  participants 
suggested  the  DoD  continue  to  work  closely  with  academia  to  maintain  U.S.  leadership  in 
the  field  and  to  ensure  that  the  DoD's  future  needs  will  be  met. 
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Grand  Challenges 
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education  and  training  programs 
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Plasma  turbulence  Turbulence  model 

Gyrokinetic  transport  Transport  methods 
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Model  resolves  mesoscale  eddies 
JIP,  NWAG,  WAX,  SEAMOS  collaboration 
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Scale-Up  to  macro-systems  is  key 


Hydrodynamics 
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Flow  in  Porous  Media 
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Scale-up  to  macro-systems  is  key 


Problem-solving  Pro 


ASSIMILATE 


TeraScale  Machine: 
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CASA  Gigabit  Testbed 
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Nonlinear  speed-up  possible  in 
heterogeneous  systems 


OK,  but  what  about  the 
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visualization  capabilities 


Super^omputing 
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Advanced  Computing 
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Still  critical  for  success 


Application  Characteristics 
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Societal  in  effect 


Characteristics  of  the 
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Agile 

Accessible  to  customer 
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Environmental  emeraencies 
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Public  Access  to  Government  Information 


Continue  investment  in  Grand 
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Agile  tools  and  resources 

Begin  deployment  of  cognitive  tools 


Dual-Use  Applications 

Professor  Geoffrey  Fox 


Syracuse  University 
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The  Dual-Use  Philosophy 

This  approach  is  buiit  around  observation  that  in  today's 
shrinking  defense  budgets,  military  products  must  share 
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•  For  example,  multimedia  databases  can  be  used  in 

•  Military  Command  and  Control,  Health  Care,  Business  and 
Government  Decision  Making,  Education  and  Entertainment 
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An  Example  to  Illustrate  Importance  of 
_ Dual-Use  Philosophy _ 

A  Real  Example:  Need  a  data-base  system  for  a  military  vehicle  which  will 
remain  nameless 
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Opportunities  for  HPCC  in  the  Science 
and  Engineering  Simuiation  Arena 
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•  Value  seems  clear  for  planning  and  real  time  control  but 

•  Industry  conservative  and  faced  with  growing  near  term 
competition 
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Suprisingly  Difficult  and  Suprisingly  Promising  Areas 
_ for  HPCC  in  Simulation _ 

□  The  role  of  HPCC  in  Manufacturing  is  quite  ciear  and  wiii  be 
critical  to 
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•  Rather  buy  somewhat  more  powerful  computers  at 
somewhat  lower  cost 
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from  different  vendors 
•  The  way  of  doing  business  in  company 

•  New  job  skiiis  and  cultures  « the  hardest  problem 
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Role  of  Government  and  DoD  in  HPCC 
_ Simulation  Applications _ 

The  limited  nearterm  industrial  use  of  HPCC  implies  that  it  is  critical 
for  Government  and  DoD  to  support  and  promote 
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•  Currently  federal  Initiatives  are  correctly  involving  Industry  in 
more  major  fashion  than  before  but  focussing  on  short  term 
needs 
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The  HPCC  Software  Industry  is  not  Viable 
_ _ in  Simulation  Area _ 

An  HPCC  Software  Industry  is  essential  if  HPCC  field  is  to  become 
commercially  succesful 
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Why  is  Dual-Use  Critical  for  National  Challenges? 

□  The  National  Challengies  have  been  correctly  identified  as  the  the 
major  HPCC  opportunity  and  there  is  a 
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So  Dual-use  or  Multi-use  development  of  modular  HPCC 
technologies,  services  and  applications  essential 
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Dual-Use  Applications  of  InfoVision  to  Command  and 
_ Control:  Video  Information  on  Demand _ 

Basic  InfoVision  Service  is  Tens  of  thousands  of  hours  of  Digital  Video 
with  Index  (Video  Browsing)  formed  from 
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Dual-Use  Applications  of  InfoVision  to  Command  and 
_ Control:  Image  Information  on  Demand _ 

Basic  InfoVision  Service  is  a  set  of  Images  stored  In 
Multiresolution  (Kodak  Photo-CD)  format  indexed  by: 
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Dual-Use  Applications  of  InfoVision  to  Command 
_ and  Control:  Text  Information  on  Demand _ 

Basic  InfoVision  Service  is  large  scale  Alphabetic  and  Numerical  database 
supported  by 
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Def@nse  Application  allow  Mission  planners  to  navigate  potential 
battlefields  to  find  unexpected  hazards  and  opportunities 
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Dual-Use  Applications  of  InfoVision  to  Command  and 
Control:  Correlation  Analysis  for  Spatial  Reasoning 

Basic  InfoVision  service  is  a  set  of  spatiaiiy  iabeiied  information 
combined  with  appropriate  decision  support  toois 
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geographic  browsing  can  be  used  to  study  possible  export  and 
marketing  strategies  to  choose  best  countries  as  targets  for  a 
particular  product 
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Dual-Use  Applications  of  InfoVision  to  Command  and 

Control:  Simulation  on  Demand 
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Categories  of  Industrial  and  Government  Applications  of  HPCC  (with 
_ reference  to  academic  applications) _ 

□  Define  information  generaliy  to  inciude  both  CNN  headline  news  and  the 
insights  on  QCD  gotten  from  iattice  gauge  theories 

□  Information  Production 
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v^ommano  ana  control  Tor  Military 
Concurrent  Engineering  and  Agiie  Manufacturing 
Integrates  Information  Production,  Analysis,  Access 
Largest  long  term  market  for  MPP 
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Tables  of  Industrial  HPCC  Applications  1  to  4:SIMULATI0N 


Item 


Application  Area  and 
Examples 

Problem 

Comments 

Machine  and 
Software 

Computational  Fluid 
Dynamics 

PDE,  FEM 

Turbulence 

Mesh  Generation 

SIMD  MIMD  for 
irregular,  adaptive 
HPF(+) 

Unclear  for 
adaptive  irregular 
mesh 

Structural  Dynamics 

PDE,  FEM 

Dominated  by 

Vendor  Codes  (e.g. 
NASTRAN) 

MIMD  as  Complex 
geometry 

HPF(+) 

•  Electromagnetic 
Simulation 

•  Antenna  Design 

•  Stealth  Vehicles 

•  Noise  in  high 
frequency  circuits 

•  Mobile  Phones 

PDE  Moment 
method  (matrix 
inversion  dominates) 

SIMD 

HPF 

Later  FEM,  FD? 

Fast  Multipole 

SIMD,  MIMD, 

HPF(+) 

Scheduling 
•  Manufacturing 

Expert  systems 
and/or 

MIMD  (unclear  speedup) 
Asyncsoft 

•  Transportation  (Dairy 
delivery  to  military 
deployment) 

•  University  Classes 

Neural  Networks, 
Simulated  Annealing 

SIMD 

HPF 

•  Airline  Scheduling  of 
crew,  planes  in  static  or 
dynamic  (Midwest 
snowstorm)  cases 

Linear  Programming 
(hard  sparse  matrix) 

MIMD 

HPF+  ? 

Partial  Differential  Equation 
Finite  Element  Method 
Finite  Difference 
Event  Driven  Simulation 
Time  Stepped  Simulation 
Computational  Fluid  Dynamics 


VR  Virtual  Reality 

HPF  High  Performance  Fortran  [HPFF92a] 

HPF+  Natural  Extensions  of  HPF  [SCCS-255] 

MPF  Fortran  plus  message  passing  for  loosely  synchronous 

software 

Asyncsoft  Parallel  Software  System  for  (particular)  class  of 
asynchronous  problems 


Note  on  Language:  HPF,  MPF  are  illustrative  for  Fortran 

one  can  use  parallel  C,  C++  or  any  similar  extensions  of  data  parallel  or  message  passing  languages 
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Tables  of  Industrial  HPCC  Applications  5  to  8:  SiMULATION 


Item 


Application  Area 
and  Examples 

Problem 

Comments 

Environmental 

Modeling 

-  Earlh/Ocean/ 
Atmospheric  Simulation 

PDE,  FD,  FEM 

Sensitivity  to  data 

Environmental 

Phenomenology 

-  Complex  systems  e.g. 
lead  concentration  in 
blood 

Empirical  models 

Monte  Carlo  and 
Histograms 

Basic  Chemistry 

•  Chemical  Potentials 

•  Elemental  Reaction 
Dynamics 

Calculate  Matrix 
elements 

Matrix 

Eigenvalue 

Multiplication, 

Inversion 

Molecular  Dynamics 

•  Biochemistry 

•  Discrete  Simulation 
Monte  Carlo  in  CFD 
(DSMC) 

•  Particle  in  the  cell 
(PIC) 

Particle  Dynamics 
with  irregular  cutoff 
forces 

Fast  Multipole 
methods 

Mix  of  PDE  and 

Particles  in  PIC  or 

DSMC 

Machine  and 
Software 


SIMD 

MIMD  for  irregular, 
adaptive  mesh 
HPF(+) 

Unclear  for  adaptive 
irregular  mesh 


Some  SIMD 
MIMD  more  natural 

HPF 


MIMD 

( maybe  SIMD) 

HPF 


HPF(+) 
or  MPF  for  fast 
multipoie 


PDE 

Partial  Differential  Equation 

VR 

Virtual  Reality 

FEM 

Finite  Eiement  Method 

HPF 

High  Performance  Fortran  [HPFF92a] 

FD 

Finite  Difference 

HPF+ 

Natural  Extensions  of  HPF  [SCCS-255] 

ED 

Event  Driven  Simulation 

MPF 

Fortran  plus  message  passing  for  loosely  synchronous 

TS 

Time  Stepped  Simuiation 

software 

CFD 

Computational  Fluid  Dynamics 

Asyncsoft  Parallel  Software  System  for  (particular)  class  of 

asynchronous  problems 

Note  on  Language:  HPF,  MPF  are  illustrative  for  Fortran 

one  can  use  parallel  C,  C++  or  any  similar  extensions  of  data  parallel  or  message  passing  languages 
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Tables  of  Industrial  HPCC  Applications  9  to  13:  SIMULATION 


Application  Area  and 
Examples 


Economic  Modeling 

•  Real  Time 
Optimization 

•  Mortgage  backed 
securities 

•  Option  Pricing 


Network  Simulations 

•  Electrical  Circuit 

•  Microwave  and  VLSI 
Chip 

•  Biological  (neural)  Circuit 


Particle  Transport 
Problems 


Graphics  (rendering) 
Hollywood 
Virtual  Reality 


Integrated  Complex 
Systems  Simulations 

•  Defense  (SIMNET, 
Flight  Simulators) 

•  Education  (SIMCITY) 

•  Multimedia/VR  in 
Entertainment 

•  Multiuser  virtual  worlds 

•  Chemical  &  Nuclear 
Plants 


Problem  Comments 


Individual  (Monte  Carlo) 


Full  simulations  of 
portfolios 


Sparse  matrices; 

Zero  structure  defined  by 
network  connectivity 


Monte  Carlo  methods  as 
in  neutron  transport  for 
explosion  Simulations 


Several  operational 
Parallel  Ray  Tracers 
Distributed  model  hard 


Event  driven  (ED)  and 
Time  Stepped  (TS) 
Simulations. 

Virtual  Reality  Interfaces. 
Database  backends. 
Interactive 


Machine  and  Software 


SIMD.HPF 


MIMD,  SIMD  Integration 
software 


MIMD 

HPF  for  matrix  elements 
MPF/library  matrix  solve 


PDE 

Partial  Differential  Equation 

VR 

FEM 

Rnite  Element  Method 

HPF 

FD 

Finite  Difference 

HPF+ 

ED 

Event  Driven  Simulation 

MPF 

TS 

CFD 

Time  Stepped  Simulation 

Computational  Fluid  Dynamics 

Asyncsoft 

Note  on  Language:  HPF,  MPF  are  Illustrative  for  Fortran 

one  can  use  parallel  C,  C++  or  any  similar  extensions  of  data  parallel  > 

MIMD 

HPF 


MIMD 

Asyncsoft  for  distributed 
database 


HPF  for  simple  ray 
tracing 

MPF  for  best  algorithms 


Timewarp 

or  other  Event  Driven 
(ED)  Simulation  needs 
Appropriate  Asyncsoft 


Integration 

Software 

Database 


HPF+  for  TS  simulation 


Virtual  Reality 

High  Performance  Fortran  [HPFF92a] 

Natural  Extensions  of  HPF  [SCCS-255] 

Fortran  plus  message  passing  for  loosely  synchronous 
software 

Parallel  Software  System  for  (particular)  class  of 
asynchronous  problems 
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Tables  of  Industrial  HPCC  Applications 

14  to  18 

Information  Analysis  -  "Data  Mining" 


Item 


Application  Area 
and  Examples 


Seismic  and 
Environmental 

data  analysis 


image  Processing 

•  Medical 
Instruments 

•  EOS  (Mission  to 
Planet  Earth) 

•  Defense 
Surveillance 

•  Computer  Vision 


Statistical 
Analysis  Packages 
(libraries) 


Healthcare  Fraud 

Inefficiency 
Securities  Fraud 
Credit  Card  Fraud 


Market 

Segmentation 


Problem 

Comments 


No  oil  In  NY  State. 
Parallel  Computer 
already  important 


Commercial 
Applications  of 
Defense  Technology. 
Component  of  many 
Information 
Integration 
Applications  e.g. 
Computer  Vision  In 
Robotics 


Optimization 
Histograms 
(see  category  4) 


Linkage  Analysis  of 
database  records  for 
correlations 


Sort  and  Classify 
records  to  determine 
customer  preference 
by  region  (city  -> 
house) 


Machine  and 
Software 


SIMD,  maybe  MIMD 
needed 

HPF 


Metacomputer 


Low  Level 
SIMD,  HPF 


Medium/High  level 
MIMD 
HPF(+ 


Software  Integration 

Asyncsoft 

Database 


HPF+  adequate  for 
many  libraries 


SIMD  or  MIMD 

Parallel  Relational 
Database  Access 
Plus  category  16) 


Some  cases  are  SIMD 
Parallel  Database 
Plus  category  1 6 
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Table  of  Industrial  Applications  19  to  24  for  Information  Access 
InfoVislon  -  Information.  Video,  Imagery  and  Simulation  on  Demand 


Transaction 
Processing 
•ATM  (automatic 
teller  machine) 

Database-most 
transactions  short.  As 
add  "value"  this 
becomes  Information 
Integration 

Collaboration 

Research  Center  or 

•  Telemedicine 

doctor(s)  -  patient 

•  Collaboratory  for 

interaction  without 

Research 

regard  to  physical 

•  Education 

location 

•  Business 

Text  on  Demand 

Multimedia  database 

•.Digital  (existing) 

Full  text  search 

libraries 

•  ERIC  Education 
database, 

•  United  Nations  - 
Worldwide 
newspapers 


n  Demand 


•  Movies,  News 
(CNN  Newsource 
&  Newsroom), 

•  Current  cable, 

•  United  Nations  - 
Policy  Support 


Kodak  GIODE 

•  "clip  art"  on 
demand 

•  Medical  images 

•  Satellite  images 


Simulation  on 


•  Education, 
Tourism,  City 
planning, 

•  Defense  mission 
planning 


Multimedia  Database 
Interactive  VCR , 
Video  Browsing,  Link 
of  video  and  text 
database 


Multimedia  database 
Image  Understanding 
for  Content  searching 
and  (terrain)  medical 
feature  identification 


Multimedia  map 
database  Generalized 
flight  simulator 
Geographical 
Information  System 


Embarrassingly 

Parallel 


Asynchronous 


Embarrassingly 

Parallel 


MIMD 

Database 


High  Speed 
Network 


MIMD 

Database 


Embarrassingly 

MIMD 

Parallel  for  multiple 

Database 

Users 

Video  Editing 
Software 

Interesting  parallel 
compression 

SIMD 

Compression 

Metaproblem 

MIMD  but 

Embarrassingly 

much  SIMD 

Parallel  plus  Loosely 
Synchronous  Image 
Understanding 

image  analysis 

Synchronous  terrain 

SIMD  terrain 

rendering  with 

engine  (parallel 

Asynchronous 

rendering) 

Hypermedia 

MIMD 

database 

Integration 

software 
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Second  Five  Table  Entries  for  Applications  25  to  33: 

Information  Integration 
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Abbreviations  used  in  tabies  of  industrial 

Applications  of  HPCC 
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Core  Enabling  HPCC  Software  Technologies 
for  Information  Production  (Simuiation) 


□  PVM,  Express,  Linda,  MPi 

□  ISiS  (Corneii) 

□  High  Performance  Fortran  (HPF)  Compiier 

□  High  Performance  C,  C++  Compiler 

□  HPF  Extensions  -  PARTI 

□  Parallel  /  Distributed  Computing  Runtime  Toois 

□  ADIFOR  (Differentiate  Fortran  Code) 

□  AVS  and  Extensions 

□  High  Performance  Fortran  Interpreter 

□  Image  Processing 

□  Parallel  Debugger 

□  Parallel  Performance  Visualization 

□  Parailei  Operating  Systems 

•  i/0 

#  Scheduiing 

□  Virtuai  Reality 

□  Event  Driven  Simuiator 
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Core  Enabling  HPCC  Technologies 
Information  Anaiysis,  Access,  integration 

Parallel  (Relational)  Database  e.g.  Oracle  7.0 
Object  database 
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Core  Enabling  HPCC  Technologies  Information 
Analysis,  Access,  Integration  continued 

□  Collaboration  Services 

•  Multi  user  video  conferencing 

•  Electronic  whiteboards,  etc. 
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Some  Anecdotes  from  New  York  State  InfoMall  Survey 
_ of  Industrial  HPCC  Applications _ 

□  Carrier  (Syracuse)  competing  with  Japanese  who  buiid  quiet  air 
conditioners.  Need  3-D  CFD  simuiations  inciuding  acoustics. 


o  I. 


o 

o 

c 

(0 

D) 

c 

w 

3 

o 

(5 

(0 

S  (0 

o 

O)-— 

5-^  MM 


Q.  Q 

oO 
O  o 

Q)  D) 

a 

0)  o 
o 


0)  !£ 
(0 


£ 

o 

3 

0 

k. 

O 

t: 

o 

a 

a 

3 

0 

CO 

3) 

o 

C 

■^M 

o 

To 

0 

o 

Q. 

0 

>* 

c 

.Q 

■MM 

c 

T3 

3 

0 

0 

0 

3 

o 

3) 

C 

^3 

i  o> 

.3  .E 

3 

3. 

E 

3.  3 

O 

3  3.  O 

^  P 

0 

0  2 

a  « 

0 

0  "o 

0 

^  0 
c  t: 

a 

0  2 

■3 

=  >3 
0  *3 

0) 

c 

o 

w 

o 

^  o 
c  cc 


E 


O  CO 

=  I 

S  i 

00  u 
■“  0) 
E  c 

»  n 

*S  E 


-=  E  ^ 


(0 

3 


Q. 

I 

(0 

O 


o 

o 

a 


O  TJ 


(0  o 


.2,(0 
TO  0) 

S  cc 
□ 


o  UJ  *3  > 


Q 

I 

CO 

0) 

> 


2  S. 
•o  -g 


o 

75 


’5  w 

>  3 
•D  O 
S  O 
C  « 

P 

OS 
u  o 

■  MM 

iM 

(0  3 

.Q  O 
0)  (0 
O)  c 
(0 
X 


^  (0 
E 


o 

■MM 

73 


0) 

c 

a 

E 

13 

3 

(0 

(0 

c 


cr 

0) 


o 

(0 

0) 

CO 


(0 

> 

(0 

■o 

0) 

CO 

0) 


c 

o 

■MM 

4-« 

CO 

c 

o 


CO 

o 


—  CO  ■— 


(0 

(0 

0 

o 

o 


c 

0 

0 


E  Q- 
2  To 

S  .  i) 
g  S’" 
Els' 

o  O  ^ 

si 

ijj,  00 

□ 


o 

c  o 
0  0 
0  4^ 

^  2  o 

—  o  c 
.2  c  0 

S  c-i 

Q,  O  0 

o  «  s 

o>  a  w 
E  9  3 

±  ajQ 


0 
o 

a 

0 
0 

0 
0 

‘E 
0 
a 

E 

t8 

ifo  ® 

.X  E  ^ 
o  3 
0  o 

.fi  0 
«•— 

3  3  ^ 

«  5  u  o 

O  e  E  « 
O  E  «  ^ 

Q  j-  0  > 

1  £  i  = 

O  C  0  ^ 
«  0  Q.  O) 

.■K  ”  “■  0 
4—  0  C 

0  o  --  c 
Q-  0  ^ 

E  o-  o  O 
5  0  0'^ 

8  s-“'s 

4-  0  3)  0 

S<.Ei 


lO 


< 

o 


o 

O) 

o 

a> 

T“ 

lO 

CM 

"S 

O 


A 

I 

I 


0 
,0 

0  g> 
0  O 

0  o 
£  c 


•D 

(D 

d 

(0 

Q. 

c 


.Q 

O 

O 

□ 


CO 

CO 

CM 

CO 

lO 

CO 


TJ 

Q> 

kiT 

>. 

(0 

Q. 

C 


o 

D) 


3 

O 


C-93 


HPCC  Industrial  Applications  In  Environmental  Modeling 

□  Several  diverse  applications 
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Dual-UsG  Imago  ProcGssing  and  Rolatod  TochnologiGS 
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OLTP  -  Online  Transaction  Processing 
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More  Anecdotes  from  the  Insurance  Industry 
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Image  From  Joint  Stars  (Grumman,  PAR  Technology) 


"The  Mother  of  All  Retreats" 
(Secretary  Cheney) 
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Rome  Laboratory  Parallel  Software 
Engineering  Cooperative 
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corporation 

New  Optimization  Methods  and  Monte  Cario 
Simulations  of  growing  Interest 
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MADIC  Industrial  Consortium  Members  as  of  1993 
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The  USMADE  Project  of  MADIC 
_ Industrial  Consortium _ _ 

United  Stetes  Multidisciplinery  Anelysis  end  Design  Environmet 
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Software  Bus  Structure  of  USMADE 
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The  Mapping  of  Heterogeneous  Metaproblems  onto 
_ Heterogeneous  Metacomputer  Systems 
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INFOMALL  and  the  National  Information  Infrastructure 


Media 

Entertainment 

Telephone 

Cable  (communications) 
Computing 
Video  game 


industries  will 
partner  to  provide 


Digital  Information  Services 


initiaiiy  provide 


Entertainment  (Video  on  Demand) 

News 

Interactive  TV  -  Shopping  etc. 


These  wiii  Justify 
capitai  investment 
to  create 


National  Information  Infrastructure  (Nil) 


How  can  these  be  exploited  for  defense,  business  use, 
academic  research,  education,  creation  of  HPCC 

Software  Industry? 
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Nil  Compute  &  Communications  Capability  in  Year  2000  ->  2005 
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InfoVision  is  a  Set  of  HPCC  Applications 
_ on  the  NII/GII _ 

Information  Video  imagery  and  Simuiation  on  Demand 
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Architectures 


Professor  William  J.  Dally 
Massachusetts  Institute  of  Technology 


What  can  Parallel  Computing  do  for 
Defense  Science  and  Engineering? 


William  J.  Dally 

IP  Massachusetts  Institute  of  Technology 

October  31,  1994 


Outline 

m  Technology  Trends 
-  Why  parallel  computing? 
m  Paraitei  Computer  Aivhiteciure 
m  LevGraging  Commodity  Technology 
m  Case  Study 
u  Futures 


Trends  in  Integrated  Circuit 
Technology 


Y««r 


Scaling  of  Circuit  Technology 


10GX“C98) 


2GVC94) 


\300m*  (64-bit  Proc.) 


Bandwidth  vs.  Capacity  ofDRAMs 


Scaling  of  Arithmetic,  Memory,  and 
Bandwidth 


m  Capacity  increases  50%  per  year  vs  10%  for  BW 
m  Time  constant  growing  at  40%  per  year 
m  Move  to  high  bandwidth  parts  resets  clock  5  years 


u  Arithmetic  performance  improves  with  area  and 
speed  (70%  per  year) 

■  Memory  capacity  improves  with  area  (50%  per 
year) 

■  Bandwidth  improves  with  speed  (10%  -  20%  per 
year) 

Bandwidth  is  becoming  the  most  critical  resource 

•  Arithmetic  (processing)  is  becoming  the  least 
critical 


C-125 


Cost  of  Bandwidth  vs.  Distance 


Technology  in  2010 


Cost  of  IGB/s  vs  distance 
Packaging  Level  Dist 

Local  on  chip  2mm 

Global  on  chip  15mm 

Between  chips  10cm 
Between  boards  30cm 
Between  cabinets  10m 
Between  buildings  200m 


Cost.  ($) 

0.06  wire 
0.50  wire 
12.00  chip  pins 
50.00  connectors 
200.00  cabies+conn 
2000.00  fiber+xcvrs 


■  CMOS  Integrated  Circuits 

-  0.05um  CMOS.  3.5cm  x  3.5cm.  {2Tk'^2) 

-  2GHi  clocks 
m  Processors 

-  4K  per  chip  50QMk^'2/proc) 

-  8TFLOPS/chip  (i»  2GHz) 
m  Memory 

-  20G  bits/chtp  (&  10a/.^2/P) 
m  5K  pins/chip 

“  SOOpm  area  grid 


Then  and  Now 


1994 

2010 

A 

Area 

2G 

2T 

IK 

Frequency 

200M 

2G 

Hz 

10 

Processors 

4 

4K 

IK 

FLOPS 

800M 

81 

10K 

Pins 

500 

5K 

10 

Global  wires 

100K 

1M 

b/rrF 

10 

PinBW 

100G 

10T 

b/s 

100 

Memory 

16M 

16G 

b 

IK 

Memory/PBW 

160p 

1.6m 

5 

10 

FLOPS/PBW 

8m 

800m 

OP/b 

100 

Implications  of  Technology  Trends 

u  Processors  are  cheap  relative  to  memory 
“  Many  processors  in  a  system 

-  Increase  P:M  ratio  -  cost  balanced  design 

m  Bandwidth  is  expensive  and  gets  more  so  with 
distance 

-  optimize  use  of  bandwidth,  not  processing 

-  Exploit  locality  -  use  ba.ndwidth  where  it's  cheap 

-  Non-uniform  nwmory  access 


Outline 


A  Generic  Parallel  Computer 


m  Technology  Trends 

m  Parallel  Computer  Architecture 

-  Grain  Size 

-  Bandwidth  ratios 

-  Mechanisms 

m  Leveraging  Commodity  Technoicgy 
m  Case  Study 
m  Futures 
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INormalizedj 


Uniform  BW Architecture  Does 
not  Scale 

m  Global  bandwidth  scales  as  V^2/3) 
m  .006%  of  silicon  provides  processing  to  match  BW 


Clustered  Architecture  Exploits 
Locality 

m  12%  of  silicon  provides  4PF  processing 
■  Can  emulate  similar  cost  uniform  BW  architecture 
m  2000x  more  cost  effective  for  local  computations 


SIZ  Ka  p•r<tltp■4(Wc1»p■ 
ZCFmdMBpcrPE 


Mechanisms 

m  Applications  should  see  latency  and  bandwidth 
set  by  physical  limits  on  short  messages. 

~  e.g.,  3G0ns  and  300MB/S  v$.  SOps  and  2MB/S 

•  Sequential  programs  should  run  well  and  be 
incrementally  parallelized 

m  Requires 

~  Sharea  address  space 

-  User-level  communication 

-  Cacne  conerence  -  to  automatically  exploit  locality 

-  Fast  synchronization  -  local  and  global 

-  Latency  toiarance 


Agility  Enables  Parallelism 


Ta$k  Size  (Operations) 


The  Impact  of  Architecture 


Outline 


m  A  good  architecture  reduces  the  parallel  software 
problem  to  fundamentals 

-  Identify  parallelism 

-  Exploit  locality 

m  A  bad  architecture  burdens  the  programmer  with 
incidentals 

-  Manage  multiple  address  spaces 

“  Coalese  messages  and  tasks  to  avoid  startup 
overhead 


m  Technology  Trends 
u  Paraifel  Computer  Architecture 

m  Leveraging  Commodity  Technology 

m  Case  Study 
m  Futures 


'■m 


Technologies  -  Commodity  and  Custom 

Technology  NH  R  Q 

Integrated  Circuit  Fabrication  $500M  $100  5M 

Processor  Design  $20M  $100  200K 

Parallel  Computer  System  $50M  $1M  50 

System  Software  $100M 

Application  Suite  (100  codes)  $500M$100K  5K 


Exploiting  Locality 

m  Inter-Processor  Networks 

-  Provide  higf} ’Neighdor’ BW 

-  Latency  and  BW  should  approach  physical  iimits 

■  Data  placement 

-  Partition  data  so  majority  of  accesses  are  local. 

-  Need  3200:160:16-1  (Local.  Cluster,  NeighPor,  Gioba!) 

m  Caching 

-  Automatic,  coherent  management  of  entire  memory  can 
automaticaiiy  r^ace  and  migrate  date  to  exploit  locality. 

m  Limited  by  communication  requirements  of  algorithms 

-  e.g..  FFT  needs  0(N)  communicahon  for  OfNigN]  ops. 


Tolerating  Latency 


m  Keep  local  resources  busy  while  waiting  for  global 
requests 
u  Multithreading 


-  Multiplex  several  'virtual  processors"  on  hardware 

-  Zero -cost  context  switch  when  watting 

-  Requires  excess  paraiteiism  to  cover  latency 
PexsMax(1.  Txg) 

I  Pipelined  memory  system 

-  Memory  system  must  support  many  outstanding 
requests 


-  Flow-cantrot  required  to  avoid  deadlock/livelock 


Parallel  Software 

m  Most  problems  have  lots  of  parallelism 

-  FFT  0(N).  LUD  DON).  Evai  Model  0(N) 

-  Almost  no  "serial'’  proNems  (or  real  -serial  fraclionl 
»  this  is  just  code  that  hasn’t  been  converted 

m  Parallel  software  hard  today  because 

-  Machines  have  poor  commumcahon  and  synchronization 
>  Need  fast  networks,  low  overhead  mterlaces.  synchronizing 

memory 

-  Management  of  locality  and  bandwidth  is  not  well 
understood 

•  Need  global  coherent  memory,  banch/vidlh  optimized 
applications 


Parallel  Software  (cont.) 


m  Little  economic  incentive  to  develop  parallel 
SQftymre  today 


-  3(X}M$  parallel  market  vs.  1(X)GS  serial  market 

-  Tools  are  primitive 


Invest  in  what  the  market  ignores: 
Scalability 


m  Desktop/Settop  systems  drive  technology 

-  process  technology 

-  processor  designs 

-  memory  designs 

-  small-scale  software 

m  Ignores  scalability 

-  global  packaging  technology 

-  latency  hiding  mechanisms 

-  exploitation  of  locality 

-  scaiabie  software 
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Outline 

m  Tecnnoicgy  Trends 
m  Parallel  Compute}  Architecture 
m  Leveraghig  Commodity  Technology 

m  Case  Study 

-  MIT  J-Machine 

-  Cray  T3D 

m  Futures 


Outline 

m  Technology  Trends 
m  Parallel  Computer  A}ctUiecture 
m  Leveraging  Commodity  Technology 
■  Case  Study 

m  Futures 

-  Parallelism  from  the  desktop  up 

-  Obstacles  and  opportunities 


Parallelism  on  the  Desktop 

m  Year 2000  -  25Gk^  chips  -  4P-hM 

-  4  processors  with  8MB  eacti  (2%P) 

«  PCs  with  single  cNp  (4P) 

»  Senrers/worstations  with  10s  of  chips  100P) 

-  Supercomputers  with  fO»  to  10*  chips  10*P) 

m  Systems  share 

-  iC  taoricaton  technology 

~  Core  processor  and  memory  design 

-  Modostty  paraiiet  software  mcxiufes 

■  High  end  systems  also  need 

-  Latency  hid‘ng  and  locality  features  in  processor 
~  Massively  pataliei  sortware  modules 

-  Hign  bandwidth  global  networss 


Parallel  Computer  Companies  are 
Struggling 


■  Their  products  are  networked  workstations 

-  ftp.,  CMS,  Paragon 

-  Its  cheaper  to  buy  the  workstations  and  network  them 

m  Converting  sequential  software  is  hard 

-  wfo  mechanisms 

*  s/ursd  memory,  latency  hiding,  and  cache  coherence. 

-  w/o  fast  access  to  global  memory  (Gc^<L) 

■  There  are  few  3rd  party  applications 

m  They  focus  only  on  the  vanishing  high-end 
market 


Opportunities 


Conclusions 


m  >1  Ox  performance/price  gain  by  increasing  P:M 

■  incremental  migration  of  programs 

-  Mechanisms  for  locality,  latency,  shared  address  space 

-  Bandwidth  balance  {L^GJ 

■  Parallelism  is  moving  downward  in  the  market 

-  SMP  servers  with  up  to  64  processors  today 

-  parallel  desktop  machines  in  a  few  years 


m  Technology  trends  motivate  parallelism 

-  70%  per  year  inctease  in  area  x  speed 

-  Nothuniform  scabng  of  processor,  memory,  and 
communication 

m  Parallel  architecture  enables  software 

-  Cost-balanced  granularity 
■■  Bandwidth  hierarchy  fL 

-  Mochanisins 

■  Invest  in  what  the  market  ignores 

**  Use  IC  technology,  sequential  software 

-  Develop  mechamms,  parallel  soitwate 

m  Supercomputers  are  time  rttachines 

-  fs  mis  stHI  valuable? 
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NUCLEAR  WEAPONS 


Advanced  Computation  for  Stewardship  of  the 

Stockpiie 

Dr.  Victor  Reis 
Department  of  Energy,  DOE 

Dr.  Andrew  White 
Los  Alamos  National  Laboratory 
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Accelerated  Strategic 
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Components 
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A/o  integrated  weapons  tests  to  provide 
validation  for  modeling  and  simulation 
DP  can  have  a  major  effect  on  industry,  if 


Possible  Problems  are ... 
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Physical  Phenomena 
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Grand  Challenges? 
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•  Storage:  80  petabytes 

•  Data  rate:  800  Gigabytes/sec 


Grand  Challenges? 
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•  Storage:  256  petabytes 

•  Data  rate:  1.6  Terabytes/sec 


Grand  Challenges? 
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Data  rate:  800  Gigabytes/sec 


Simulation 
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Environmental 
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focus  of  the  data  storage 
Enterta/nment will  provide 
visualization  capabilities 


Super0omputing 
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High-end  systems  $  o.5B 


TeraScale  Machine 
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What  we  need  is  a ... 


ASCI  PROGRAM 

Investment  in  technology  and  industries 


All  partnerships  linked 
Several  partners,  selection  criteria 
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Network,  and  storage:  $5M 
Security:  $1M 


Issues 
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Application  of  MPP  to  the  Solution  of 
Environmental  Modeling  Problems  and  Nuclear 

Test  Ban  Verification 


James  F.  Lewkowicz 


Phillips  Laboratory,  Hanscom  AFB 


Application  of  MPP  to  the  Solution  of  Environmental  Modeling 
Problems  and  Nuclear  Test  Ban  Verification 


James  F.  Lewkowicz 
Phillips  Laboratory  (AFMC) 
Geophysics  Directorate 
29  Randolph  Road 

HanscomAFB  Massachusetts  01731-3010 


Our  lack  of  understanding  regarding  a  variety  of  important  defense  related 
problems  in  many  cases  stems  from  the  inability  to  realistically  model  complicated 
geophysical  processes  that  govern  the  problems.  Two  examples  of  such  problems  are 
understanding  seismic  wave  radiation  from  underground  nuclear  explosions  and 
modeling  processes  that  control  subsurface  hazardous  waste  plumes.  These  areas  address 
DoD  high  priority  requirements  in  counter/nonproliferation  (e.g..  Comprehensive  Test 
Ban  Treaty  monitoring)  and  the  environmental  remediation  of  hazardous  waste  sites  at 
DoD  installations,  respectively.  Research  in  environmental  remediation  also  supports 
DoD  counterproliferation  efforts  to  identify  and  characterize  underground  structures. 
Additionally,  these  areas  also  have  strong  dual  use  applications  in  the  oil  exploration  and 
engineering  industries. 

The  application  of  supercomputers  known  as  Massively  Parallel  Processors 
(MPP)  to  the  solution  of  these  important  DoD  problems  will  be  crucial.  In  fact,  the 
introduction  of  MPP  technology  has  already  made  impacts  in  the  solution  of  complex 
problems  in  geophysics,  specifically  in  the  solution  of  seismic  wave  propagation  in 
complicated  Earth  models  and  the  processing  of  ground-penetrating  radar  for  the 
detection  of  hazardous  wastes.  Some  of  the  details  about  the  progress  of  these  advances 
are  discussed  in  this  paper  in  addition  to  outlining  the  challenges  that  lie  ahead. 

1.  INTRODUCTION 

The  use  of  computers  has  revolutionized  the  way  scientific  enquiries  are  made, 
and  now  calculations  which  were  once  considered  untenable  are  simple  exercises.  The 
evolution  of  computer  technology  over  the  past  35  years  has  allowed  a  great  expansion  of 
the  types  of  problems  that  can  be  solved,  and  computational  turnaround  routinely 
decreases  by  multiple  factors  each  time  a  new  generation  of  computers  is  produced. 
Computer  microelectronics  has  progressed  from  the  production  of  the  first  integrated 
circuit  in  1959  to  the  manufacture  of  single,  printed  circuit  computers  which  can  carry 
out  more  than  ten  million  instructions  per  second.  This  improvement  in  chip  technology 
has  reduced  signaling  time  between  components,  since  the  distance  between  chips  has 
decreased.  It  has  also  decreased  the  net  cost  per  component,  and  this  has  led  to  the  idea 
of  putting  multiple  sequential  computers  together  to  greatly  enhance  net  performance.  In 
fact,  there  is  now  general  agreement  that  the  only  way  to  achieve  significant 
improvement  in  performance  is  through  concurrent  computation,  whereby  many 
computers/processors  are  used  in  tandem  to  solve  the  same  problem.  Massively  Parallel 
Processor  (MPP)  technology  is  an  example  of  this  concept.  MPP  machines  combine 
large  numbers  of  processors  (known  as  nodes)  to  carry  out  simultaneous  actions  on  data 
sets.  There  are  two  types  of  MPP  machines;  SIMD  (Single  Instruction  Multiple  Data) 
machines  simultaneously  carry  out  the  same  program  instructions  on  multiple  nodes,  each 
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of  which  has  only  a  small  amount  of  resident  memory.  An  example  of  this  type  of 
computer  is  the  Connection  Machine  manufactured  by  Thinking  Machines,  Inc.  In 
contrast,  MIMD  (Multiple  Instruction  Multiple  Data)  machines  have  larger  memory 
storage  per  node,  and  thus  each  node  can  carry  out  a  separate  set  of  program  instructions 
Examples  of  this  type  of  computer  include  the  nCUBE  and  Cray  T3D  computers.  From 
the  programmer’s  viewpoint,  making  an  algorithm  work  on  a  MIMD  machine  involves 
devising  a  way  to  decompose  the  problem  into  small  segments  to  which  an  individual 
node  can  be  assigned,  with  the  results  being  collated  at  the  end  of  each  individual  set  of 
calculations. 


2.  APPLICATION  OF  MPP  TO  DEFENSE-RELATED  RESEARCH 


Nuclear  Test  Ban  Treaty  Verification 


Much  of  our  inability  to  understand  the  complicated  behavior  of  geophysical 
phenomena  is  due  to  incomplete  modeling  of  the  underlying  physical  processes. 
Historically,  important  treaty  verification  research  problems  were  focused  on  estimating 
the  yield  of  relatively  large  explosions  (-150  kilotons)  in  support  of  Threshold  Test  Ban 
Treaty  ('iTBT)  monitoring.  This  work  was  primarily  focused  on  test  sites  in  the  Former 
Soviet  Union  (FSU).  However,  with  the  decline  of  the  FSU  and  the  current  negotiations 
in  Geneva  to  negotiate  a  Comprehensive  Test  Ban  Treaty  (CTBT)  an  important  research 
goal  is  to  correctly  model  seismic  wave  propagation  between  relatively  small  nuclear 
weapons  tests  and  receivers  (seismometers)  separated  by  regional  distances  (500-2000 
km).  Contrasted  to  the  TTBT  monitoring  situation,  where  the  US  was  primarily 
concerned  with  monitoring  well  known  test  sites,  the  CTBT  monitoring  situation  is  one  in 
which  a  number  of  countries  have  become  potential  nuclear  proliferators.  This  means 
that  we  must  be  able  to  understand  seismic  wave  propagation  in  a  variety  of  geologically 
diverse  areas.  Figure  1  depicts  a  simple  picture  of  the  monitoring  problem.  On  the  left 
are  shown  various  seismic  sources  ranging  from  mine  blasts  to  earthquakes  that  must  be 
di^scriminated  from  nuclear  weapons  tests.  The  various  seismic  rays  emanating  from  each 
of  the  sources  are  recorded  by  a  seismometer  at  the  Earth's  surface.  Also  shown  are 
simplified  seismograms  from  each  of  the  sources.  The  problem  is  to  detect  and 
subsequently  identify  correctly  the  seismic  source  associated  with  each  seismogram. 


To  do  this  properly  many  factors  must  be  taken  into  account,  such  as  strong 
scattering  effects  in  the  shallow  crust,  source  behavior,  surface  topography  and  velocity 
heterogeneities.  One  of  the  most  widely  applied  numerical  techniques  used  to  solve  the 
seisniic  wave  propagation  problem  is  the  finite  difference  technique,  which  involves  the 
solution  of  differential  equations  with  specific  boundary  conditions  over  a  spatial  grid 
describing  a  portion  of  the  Earth.  This  is  a  numerically  intensive  process,  because  to 
maintain  accuracy  in  computing  a  time-dependent  wavefield,  small  grid  spacing  and  time 
steps  must  be  taken.  The  use  of  MPP  allows  efficient  solutions  of  finite  difference 
pr^lems  through  grid  decomposition.  The  decomposition  is  done  by  dividing  the  spatial 
grid  into  segments,  each  of  which  are  placed  on  individual  nodes.  The  wavefield  is  then 
propagated  on  each  segment  of  the  grid  for  a  single  time  step,  and  the  results  are 
communicated  across  nodes  before  moving  on  to  the  next  time  step.  Using  this  type  of 
par^leliMtion,  seismic  wave  propagation  using  finite  difference  techniques  becomes 
feasible  for  l^ge  problems.  For  example.  Figure  2  shows  the  superposition  of  a  2-D 
wavefield  calculated  using  the  finite  difference  method  over  a  heterogeneous  structure 
known  as  the  Marmousi  model.  The  expanding  ring  on  the  upper  left  hand  comer  of  the 
model  represents  the  spreading  seismic  wavefront  propagating  through  the  medium  The 
gnd  diinensions  in  this  problem  are  751x2501,  and  the  complete  wavefield  was  generated 
on  32  nCUBE  nodes  in  approximately  13  minutes  using  3000  time  steps 


While  2-D  techniques  have  been  used  in  the  past  to  provide  significant  insight 
into  the  physics  governing  seismic  events,  it  is  now  possible  to  solve  very  large  scale  3-D 
propagation  problems.  A  3-D  wavefield  propagation  example  is  shown  in  Figure  3.  The 
model  is  a  two-layer  graben,  and  the  calculation  was  done  on  a  400x100x100  element 
grid.  The  run  time  on  64  nCUBE  2  processors  was  about  40  minutes  for  2000  time  steps. 
This  type  of  3-D  modelling  can  be  used  in  a  variety  of  applications,  including  nuclear 
test  ban  verification,  assessment  of  earthquake  hazards,  and  the  dual  use  technologies  of 
petroleum  exploration  and  civil  and  hydrological  engineering.  The  ability  to  use  the 
more  realistic  3-D  techniques  in  answering  some  of  the  important  questions  in  these 
fields  will  add  gready  to  our  understanding  of  the  underlying  physics. 

Petroleum  Exploration  and  Recovery 

Some  of  the  most  computationally  expensive  problems  in  seismology  occur  in  the 
petroleum  exploration  and  recovery  industries.  Along  with  modelling  the  complex 
wavefields  which  are  seen  at  sensors  on  the  surface  and  in  boreholes,  the  exploration 
seismologist  is  interested  in  using  imaging  and  inversion  techniques  to  determine  the 
location  and  properties  of  subsurface  reservoirs.  Previously  the  imaging  and  inversion 
techniques  were  limited  to  small  subsets  of  data  over  smaller  regions  of  the  Earth,  but 
with  the  increasing  use  of  MPPs  in  imaging,  more  power  is  available  to  produce  realistic 
images  of  subsurface  reservoirs.  One  of  the  most  frequently  used  imaging  techniques  in 
exploration  seismology  is  known  as  migration.  It  is  primarily  done  to  refocus  events 
reflecting  from  subsurface  horizons  to  their  proper  location  in  time  and  depth.  There  are 
many  variations  in  migration  algorithms,  each  of  which  has  different  mathematical 
approximations  to  make  the  calculations  more  manageable.  For  example,  a  2-D  pre¬ 
stack  Kirchhoff  migration  code  has  been  used  to  image  the  synthetic  seismograms 
produced  from  the  Marmousi  model  discussed  in  the  previous  section.  This  software  can 
image  the  entire  Marmousi  data  set  (240  seismic  sources  and  96  receiver  sensors)  in  four 
minutes.  The  results  from  this  Kirchhoff  imaging  procedure  are  shown  in  Figure  4, 
which  depicts  the  Marmousi  model  at  the  top  of  the  figure  and  the  imaging  results  at  the 
bottom.  This  type  of  turnaround  in  computing  the  migrated  image  makes  iterative 
imaging  (where  the  velocity  model  is  adjusted  slightly  and  the  imaging  step  is  done  to 
check  the  results)  a  viable  prospect.  This  is  an  exciting  application  of  MPP  technology  to 
the  detection  and  recovery  of  petroleum  reserves  which  previously  was  thought  to  be  in 
the  distant  future.  The  speed  and  versatility  of  the  MIMD  architecture  will  allow  even 
more  powerful  2-D  imaging  and  inversion  techniques  to  be  tested  on  observables,  and 
even  the  use  of  3-D  algorithms  in  the  foreseeable  future. 

Environmental  Modeling 

It  is  recognized  that  the  DoD  is  facing  an  enormous  expense  to  environmentally 
restore  many,  if  not  all,  of  its  military  installations.  Estimates  of  this  expense  have  been 
projected  to  be  in  the  billion  dollar(s)  range.  Geophysical  modeling  and  simulation  has 
the  potential  to  save  the  DoD  tremendous  expense  and  there  are  several  environmental 
problems  which  are  of  critical  importance  to  the  DoD  that  can  be  solved  effectively  using 
MPPs.  For  example,  the  subsurface  migration  of  hazardous  waste  materials  can  lead  to 
potential  water  table  contamination.  Ground-penetrating  radar  (GPR)  is  a  technique  used 
to  image  the  shallow  subsurface  using  high-frequency  electromagnetic  waves  that  reflect 
from  subsurface  contrasts  in  dielectric  constant.  By  relating  radar  propagation  velocities 
to  subsurface  water  content  through  empirical  relations,  a  potentially  important  indicator 
of  contamination  can  be  derived.  Before  a  reasonable  picture  of  the  subsurface  can  be 
obtained,  however,  several  data  processing  steps  must  be  taken.  These  include  a 
complicated  stacking  and  enhancement  of  the  GPR  data  to  accurately  resolve  the 


C-159 


positions  of  reflective  contrast  in  the  subsurface.  These  steps  are  computationally 
intensive,  and  can  be  accomplished  efficiently  by  processing  algorithms  resident  on  an 
MPP.  As  an  example,  Figure  5  shows  a  grayscale  picture  of  GPR  data  acquired  in  the 
Chalk  River  test  area  in  Canada  prior  to  the  data  processing  and  imaging  steps.  There  are 
very  few  coherent  reflectors  apparent  in  the  subsurface  beside  the  strong  one  dipping  to 
the  right  between  Common  Midpoints  (CMPs)  202  and  902.  Figure  6  shows  the  effects 
of  the  processing.  The  region  on  the  right  changes  dramatically,  with  many  reflectors 
emerging  from  the  stack,  and  the  continuity  of  the  reflections  on  the  left  is  improved. 
Using  the  results  from  the  radar  propagation  velocity  processing,  the  subsurface  water 
content  can  be  estimated  for  the  Chalk  River  area,  as  shown  in  Figure  7.  The  results 
show  a  zone  of  increasingly  shallow  high  water  content  from  left  to  right  across  the 
profile,  which  can  be  interpreted  as  an  indication  of  a  rising  water  table.  Since  this  region 
of  high  water  content  cuts  across  the  detailed  reflection  structure,  it  implies  that  water- 
filled  porosity  and  permeability  pathways  are  not  constrained  to  apparent  stratigraphic 
structure. 

Another  potential  environmental  application  of  MPP  technology  still  at  a  basic 
research  level  is  the  simulation  of  seismoelectric  wave  propagation.  When  seismic  waves 
propagate  through  a  fluid-saturated  sedimentary  material,  electrical  current  systems  are 
set  up  in  the  material,  which  induce  non-radiating  fields.  When  the  seismic  waves 
impinge  on  a  contrast  in  electrical  and  or  mechanical  properties,  the  current  systems  on 
both  sides  form  a  complex  dynamic  current  system  which  generates  electromagnetic 
waves.  TTiese  waves  are  detectable  at  the  surface  of  the  Earth  using  dipole  antennas.  A 
schematic  of  this  phenomenon  is  shown  in  Figure  8.  It  is  possible  by  performing  joint 
electromagnetic  and  seismic  surveys  over  shallow  crustal  areas,  a  more  useful  estimate  of 
subsurface  properties  rnay  be  determined.  This  might  lead  to  powerful  indicators  of 
environmental  contamination,  since  the  experiment  would  provide  estimates  of  both  the 
mechanical,  fluid  and  electromagnetic  properties  underneath  the  survey  area.  Examples 
of  the  fields  generated  from  an  electroseismic  survey  are  shown  in  Figures  9  and  10.  The 
model  in  Figure  10  is  that  of  a  typical  road  fill  material  superposed  on  top  of  a  glacial 
sedimentary  sequence.  The  electromagnetic  and  seismic  wavefields  are  calculated  using 
an^  MPP  to  solve  the  complicated  coupled  set  of  equations  which  describes  the 
seismoelectric  phenomena.  Figure  10  shows  the  modeled  electrical  and  seismic 
wavefields,  as  well  as  the  difference  in  electromagnetic  signal  as  a  function  of  the  depth 
of  road  fill.  Without  the  use  of  an  MPP  to  calculate  the  wavefields,  the  computational 
expense  of  testing  this  potentially  powerful  technique  would  be  exhorbitant. 

Characterization  of  Underground  Structures 

The  same  geophysical  modeling  techniques  that  are  described  above  in  the  section 
on  environmental  modeling  are  direcUy  applicable  to  the  problem  of  detecting  and 
characterizing  underground  structures.  The  importance  of  this  work  was  highlighted  in 
the  the  Iraq  conflict  For  pre-strike  targeting  purposes,  pilots  need  to  know  the  exact 
location  of  underground  targets,  in  addition  to  characterizing  the  physical  properties  of 
the  structure  and  geology  in  which  the  structure  has  been  embedded.  In  a  post-strike 
mode,  additional  information,  based  on  geophysical  observations  and  modeling  can  be 
useful  to  discern  the  level  of  damage  assessment. 

3.  MPP  LIMITATIONS 

"^ere  are  two  current  limitations  on  the  use  of  MPP  in  defense-related 
pophysical  research  are  in  both  the  hardware  and  software  areas.  The  hardware 
limitations  include  CPU  limits  on  modeling,  imaging  and  inversion  algorithms,  memory 
bounds  on  large  modeling  problems,  and  input/output  bounds  on  out-of-core  imaging 
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efforts.  The  software  limitations  involve  input/output  functionality  (e.g.,  asynchronous 
I/O),  code  portability,  code  sharing  by  programmers  over  communication  networks  such 
as  the  Internet,  and  programmer  training  to  increase  the  number  of  MPP-literate 
computational  scientists. 

While  software  advances  have  been  steady,  they  have  been  eclipsed  by  advances 
in  hardware.  What  is  needed  at  this  time,  are  efforts  to  support  software  development 
and  the  training  of  scientists  to  utilize  MPP  in  order  to  exploit  fully  their  potential  to 
solve  both  DoD  and  civilian  problems. 

4.  CONCLUSIONS  AND  SUGGESTIONS 

As  a  DoD  program  manager,  with  responsibility  for  contract  research  programs  in 
both  CTBT  monitoring  and  environmental  areas,  I  am  most  interested  in  the  application 
of  MPP  to  the  solution  of  relevant  DoD  problems.  MPP  have  the  extraordinary  potential 
to  provide  DoD  with  the  insight  and  knowledge  necessary  to  solve  our  most  longstanding 
and  difficult  problems.  However,  I  am  very  concerned  with  the  apparent  lack  of 
availability  of  MPP  for  the  general  scientific  community  and  specifically  the  university 
community.  I  believe  the  foundation  for  advancements  in  the  application  of  MPP  to 
important  DoD  and  civilian  problems  will  come,  in  large  part,  from  the  young  scientists 
currently  being  trained  in  our  universities.  Therefore,  I  believe  the  DoD  community 
should  make  strenuous  efforts  to  ensure  MPP  are  readily  available  and  the  software 
limitations  mentioned  above  will,  through  the  midnight  oil  of  PhD  dissertations,  in  large 
part  disappear. 

One  efficient  and  cost  effective  way  to  achieve  this  goal  is  for  DoD  to  foster  and 
support  partnerships  between  industry  and  academia.  While  there  may  be  many  ongoing 
partnerships,  one  that  I  am  aware  of  is  between  the  Massachusetts  Institute  of  Technology 
(MIT)  and  nCUBE.  nCUBE  has  placed  an  MPP  at  MIT's  Earth  Resources  Laboratory 
(ERL)  and  is  cost  sharing  the  operation  of  this  computer  with  ERL.  ERL  in  turn  has 
made  this  MPP  available  to  a  wide  audience  of  students  and  other  users,  including 
providing  much  needed  training  on  how  to  utilize  the  potential  of  MPP,  Everyone, 
including  the  DoD,  benefits  by  such  partnerships. 
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Figure  3 
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Figure  6 
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The  ElectroSeismic  Method 


Figure  8 
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Figure  9 
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Figure  10 
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MULTIDIMENSIONAL 

HYDRODYNAMICS 


As  a  Tool  in  Nuclear  Test  Ban 
Verification 

□  Background 

□  Shock  wave  methods 

□  Simulated  explosions 

□  Yield  estimates 

□  Concluding  remarks 
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TEST  BAN  VERIFICATION 

-  UIUC  —  —  — 

Background 

•  Hydrodynamic  yield  estimation  methods 
originally  developed  for  testing  program 

•  When  PNET  was  negotiated  in  1974,  HYE 
was  incorporated  as  a  verification  tool 

•  In  mid-80’s,  Reagan  administration  insisted 
on  HYE  &  CORRTEX  sensors  for  TTBT  - 

-  Unhappy  with  seismic  methods 

-  Wanted  to  signal  a  higher  verification  std 

-  Wanted  to  focus  attention  on  past  treaties 
to  forestall  movement  toward  a  CTBT 

•  Use  of  HYE  for  verification  is  very  different 
from  use  in  a  weapon  testing  program 

•  HYE  sonsors  &  methods  are  highly  classified 

•  In  1986,  strengths  &  weaknesses  of  HYE 
as  a  verification  tool  were  poorly  known 

•  I  became  involved  in  1986  due  to  a  request 
from  DARPA's  NMRO  to  the  DSSG 
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SHOCK  WAVE  METHODS 

Before  Explosion 


Experimental  CORRTEX 

equipment  recorder 
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SHOCK  WAVE  METHODS 

UIUC  ■  ' 

After  Explosion 


Experimental  CORRTEX 

equipment  recorder 
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SHOCK  WAVE  METHODS 

UIUC —  -  — 

Yield  Estimation  Algorithms 


Components 

•  Model  of  the  motion  of  the  shock  front  that 
depends  on  the  yield 

•  A  procedure  for  fitting  the  shock-front 
position  predicted  by  the  model  to  the  data 

Examples 

°  Power-law  model* 

°  Analytical  model* 

Similar  explosion  scaling* 

Simulated  explosion  scaling* 

Suite  of  numerical  simulations 

Assumes  cube-root  scaling 
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SHOCK  WAVE  METHODS 

Cube-Root  Scaling 


Requirements 

•  Explosions  are  spherically  symmetric 

•  Explosion  sources  all  have  same  e.o.s. 

•  M,  R,  P  of  the  sources  scale  with  W 

•  Ambient  media  are  identical  and  uniform, 
can  be  treated  as  fluids 

•  Heat  transpot  and  viscous  enegy  dissipation 
behind  the  shock  front  are  negligible 


Scaling 

R(t;W)  =  g(t 

where  g(x)  describes  the  evolution  in  time  of 
an  explosion  with  yield 
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SHOCK  WAVE  EVOLUTION 


UIUC  . — 

Potential  Complications 

Complications  caused  by  the  source 

•  Canisters  are  meters  in  diameter 

•  Source  shape  is  aspherical 

•  Hence  shock  R  vs.  t  depends  on  the  source 

Complications  in  a  uniform  medium 

•  Motion  is  affected  by  phase  transitions 

•  Motion  is  affected  by  rock  strength 

•  Shock  splits  into  two-wave  structure 

Complications  due  to  inhomogeneities 

•  Complex  energy  flows  in  pipes/tunnels 

•  Distortion  of  shock  wave  by  layers 

•  Distortion  of  shock  wave  by  voids 

FKL2/90 
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NUMERICAL  SIMULATIONS 


Explosion  Parameters 
Source  Geometries 


Other  Parameters 

•  All  explosion  have  yields  of  125  kt 

•  All  explosions  are  in  Westerly  granite 
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NUMERICAL  SIMULATIONS 


ZEUS-2D  Hydrodynamics  Code 


•  Uses  finite-difference  equations  and  a 
second-order-accurate,  operator-split, 
multi-step,  explicit-time  integration  scheme 

•  Von  Neumann  artificial  viscosity  spreads 
shock  waves  over  several  mesh  cells 

•  Advection  is  treated  using  van  Leer's 
monotonicity-preserving  algorithm 

•  Uses  the  first-order  method  of  LeBlanc 
to  treat  material  interfaces 


•  Results  presented  here  use  cells 

•  Code  was  tested  by  solving  a  large  suite 
of  test  problems 

•  Simulated  explosions  were  compared  with 
exact  analytical  solutions  for  ideal  gases 
and  ZEUS-1  D  simulations  for  granite 
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YIELD  ESTIMATES 


Assumptions 


•  No  uncertainties  in  shock  front  positions  or 
times 

•  All  exiposions  are  in  Westerly  granite 

•  All  explosions  are  axially  symmetric 

•  Axis  of  symmetry  is  known  exactly,  only  the 
vertical  position  Zq  is  unknown 

•  'Unknown'  and  'reference'  explosions  all 
have  same  yield  (125  kt) 

•  Data  analysis  intervai  is  1 .0-2.5  ms 
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MULTIDIMENSIONAL 

HYDRODYNAMICS 

UIUC 

Concluding  Remarks 


•  HYE  problem  still  unsolved  (but  less  urgent) 

•  The  main  difficulty  is  lack  of  data:  one  set  of 
sensing  cables  does  not  give  enough 

•  These  limitations  must  be  recognized,  esp.  if 
HYE  is  ever  used  for  PNET/TTBT  verification 

•  Inadequate  understanding  due  in  part  to 

-  secrecy 

-  lack  of  will 

-  limited  access  to  computers  and  codes 
plus  desire  to  preserve  3rd  generation 
weapons  programs  (esp.  X-ray  laser) 
led  to  weak  verification  protocols 


This  does  not  mean  that  the  U.S.  should 
devote  more  resources  to  HYE  or  that  the 
protocols  should  be  renegotiated.  Can  accept 
current  seismic  capabilities  or  improve  them. 
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Computing  Systems  Technology  Office 
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DigHai  Library  dlaousalon 


Linking  Electronic  Libraries 


CNRI  1994 


Electronic  Copyright  Management  System 
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Sector-oriented  Crosscut 
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Info  Electronic  Carpool  Governme 

Kiosk  Procurement  Formation  EForms 

POf  Official  USS  Only 


For  Official  Use  Only 
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Very  High  Speed  Signal  Acquisition  and 

Processing 


Professor  Thomas  A.  Prince 


California  Institute  of  Technology 


High  Speed  Signal 
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CSCC  COMPUTER  ENVIRONMENT 
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Mux  Ueusity  (urbllrury  units) 
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MPP  HARDWARE 


Oonn^otlonn 
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Concurrent  Supercomputing  Consortium 

(CSCC) 
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-intei  Touchstone  Deita  (1991) 
-Scaiabie  i/O  initiative  (1994) 


Concurrent  Supercomputing  Consortium  (CSCC) 

Membership 

•  Additional  members  of  the  consortium  are: 

-  Lawrence  Livermore  National  Laboratory 

-  Los  Alamos  National  Laboratory 

-  Purdue  University 

-  Sandia  National  Laboratories 
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Concurrent  Supercomputing  Consortium  (CSCC ) 

Membership 

•  The  consortium’s  partners  are: 

-  Argonne  National  Laboratory 

-  California  Institute  of  Technology 

-  Caltech’s  Jet  Propulsion  Laboratory 

-  The  Center  for  Research  in  Parallel  Computation 

°  (a  National  Science  Foundation  Science 
and  Technology  Center:  Rice,  Caltech,  Argonne,  Los 
Alamos,  Syracuse,  Tennessee/Oak  Ridge) 

-  Defense  Advanced  Research  Projects  Agency 

-  Intel  Corporation’s  Supercomputer  Systems  Division 

-  National  Aeronautics  and  Space  Administration  (NASA) 

-  National  Science  Foundation  programs  in  computational 
science  and  engineering 

-  Pacific  Northwest  Laboratory 
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What  is  a  High-Performance 
Computing  Environment? 
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HIGH-END  APPLICATIONS: 
SPECIFIC  EXAMPLES 
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200,000  unknowns,  250  GBytes  desirable 

Memory  and  I/O  Limited 


SPEED/STORAGE 

1  MF  op  X  ^  RELATIONSHIP 


1 0  KByte/s  I/O 


EXAMPLE:  FFT 

(^AST  FOURIER  TRANSFORM) 
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approximately  balanced 


C-261 


100  M  Byte/s  l/Oi 


C-263 


10  GByte/s  I/O 


SCALABLE  I/O  INITIATIVE 
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-  CSCC  Intel  Paragon  at  Caltech,  IBM  SP-2  at 
ANL,  Cray  T3D  at  JPL 


SAR  Applications 
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processing  for  1 5  s  of  data  collection) 

aluation  of  NASA  SAR  Free  Flyer 

Require  real  time  processing  for  8-12  channels 
Require  large  bandwidth  and  high-volume  storage 


SAR  Applications 
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TABLE  1.2  Key  Parameter*  tor  Free-flying  SAR  SalelHIe  Systems 


Satellite 

Seasat 

ALMAZ 

E-ERS-1 

J-ERS-1 

Radarsat 

Agency/Country 

NASA/USA 

USSR 

ESA 

NASD  Ay  Japan 

Canada 

Launch  Date 

1978 

1991 

1991 

1992 

1995 

Altitude  (km) 

800 

280 

785 

565 

Frequency  Band  (GHz) 
Polarization 

L(1.3) 

HH 

Si  3.0) 

HH 

C(5.3) 

W 

Ll  1.2) 

HH 

05.3) 

HH 

Incidence  Angle 

23'- 

30-60= 

23= 

35= 

20-50" 

Antenna  Size  Im  x  m) 

10.7  X  12 

15  X  1.5  (two) 

10  X  1.0 

12  X  12 

15  X  1.6 

Noise  Equiv  (dB) 

-18 

- 

-18 

-20 

-21 

Swath  Width  (km) 

100 

25-50 

100 

75 

50-500 

Az  Resolution  (m)  Looks 

23  4 

15  2 

25  3 

30  4 

IS  4 

Range  Bandwidth  (MHz) 

19 

Uncoded 

13J 

15 

1 1.5, 1 7  J.  30 

i 

Quantization  ^bps) 

Analog 

J 

5 

3 

TABLE  1.3  Key  System  Parameters  for  Selected  Airborne  SAR  Systems 


.Agency 

NASA  JPL 

NAVT  ERIM 

CCRS  MDA 

■Aircraft 

EXT-S 

NADC  P-3 

CV-5S0 

Frequency  (GHz) 

0.44. 1.25. 5.3 

1.25.5.3.9.34 

5.3, 9-15 

Polarization 

Quad 

Quad 

Dual  >  like,  cross) 

Az  Resolution  (m)/ Looks 

8  4 

2.2  1 

6/- 

Range  Bandwidth  (MHz) 

20,40 

50. 100 

100 

Swath  Width  { km ) 

10-18 

6-48 

l8-cf 

Look  Angle  (degrees) 

10-65 

11-79 

0-8“ 

Quantization  (bps) 

8 

6 

6 

Noise  Equiv  (far  range) 

-38(P),  -40(H. -36(C) 

-45(Li. -25(C), -31(X) 

-41;C) 

Faremeters  for  the  Shuttle  Imaging  Radar  Missions 


Mission 

SIR-A 

SIR-B 

SIR-C 

X-SAR 

Date 

1981 

1984 

1993. 1994 

1993, 1994 

Altitude  (km) 

259 

225 

215 

215 

Frequency  Band  ( GHz) 

L(128) 

L(IZ8) 

L(128LC(5J) 

X(9.6) 

Polarization 

HH 

HH 

HH.HV.VH,W 

w 

Incidence  Angie 

50= 

15-60° 

15-60= 

15-60= 

Antenna  Size  (m  x  m) 

9.4  X  12 

10.7  X  22 

111  X  18(L) 

111  X  a8(C) 

111  X  0.4 

Noise  Equiv  cr®  (dB) 

-25 

-35 

-50(LL  -40(C) 

-26 

Swath  Width  (km) 

50 

15-50 

30-100 

10-45 

Az.  Rng  Resolution  ( m ) 

4.7-33 

5.4/14.4 

6.1  .'8.7 

6.1 /8.7 

=  2^R^  (^)  ~  (^)  (^) 
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vmagc^ 


(Sx)(Sr) 


'[QOminJ  \100kmJ\2i 


20m , 


SEASAT 
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100  km 

Azimuthal  Resolution  (m) 

Sx 

23  m 

Range  Resolution  (m) 

6r 

20  m 

Nominal 


Fraction  of  earth  surface 

fearth 

0.3 

Radius  of  earth  (km) 

Rearth 

6400 
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Broadband  Signal  Acquisition 
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approach  to  acquisitlon/reductlon/atorage 
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CSCC  COMPUTER  ENVIRONMENT 
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entire  RF-hasdand  chain  mag  be 
duriiicaied  for  second  ooiarization 


SCU  43154 


Pulsar  Digital  Recording  System 

Overview  of  digital  hardware  (single  wire- wrap  board) 


CMOS  VLSI 


Quad  2  bit  ADC 


CMOS/'ECL 
level  shift 


taoe  ir.tenace 


Comparator 
ref  voitaaes 


i50  MHz  clock) 


5  MHz 


To  tape  recorder: 

8  bits  parallel  daai, 

1  bit  pariw  - 
50.  :i  or'l2.5  .MHz 
write  clock 


oiaire 


50  MHz  in 


12  12-bit 
D/.-\  converters 


:onrro 


Phase-iock 

loop 


Voltage-controllcc 
crystal  oscillator 
50  MHz 
±  20  ppm  range 


-Microorocessor 


RS-232  RS-232  l-PPS  in  5  MHz 

Command  Timecode  reference 

interface  in  in 


Figure  1 


SCU  6.3.94 


C-283 


w  V  KU  40-m  telescope  ouu-MHz  receiver  and  digital  recorder  block  diagram 
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OUTLINE 


INTRODUCTION 
Problem  Definition 
Direct  Insertion  vs  Surrogates 

BAYESIAN-VALIDATED  SURROGATES 
General  Framework 
Attributes 

Contributing  "Technologies” 
APPLICATIONS 

Effective  Conductivity  of  Composite 
(Stokes)  Drag  on  Axisymmetric  Body 

SAMPLE  PROOF  {K  =  L  =  J=l) 

mp(T)  Cumulative  Distribution  Function 

VAC  Statement 

VAf  and  VC  Results 

Multiple  Studies  (vs  Random  Search) 
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INTRODUCTION 


Preliminaries 

□  Physical  system 

— »  mathematical  system 

□  Deterministic  mathematical  system  At®®; 
p,  input  vector  in  Q  c 

s,  (single)  output  In  JR^  (-K^) 

<S(p) :  SI JR,  input— output  function  (L°^) 

□  Global  optimization  problem  (goal): 

A,  target  output  value 

p*(A)  =  arg  mln(  U/(p,  A)  =  |5(p)  -  A| ) 

P€f2 

o  Extension:  ^ir(p,  a)  =  ■!/'(<S(p),  p,  A) 
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□  Simulation  M^: 


M 


0 


P  G  i?p  —  <5(p)  4”  Wp  (noise) 


~  fwipi'^l^)  =  K — ^7^)/^T^(P)  (sym) 

(TwiP) 

W^:  unbounded,  h(v)  >  0  for  all  i;  g  iR 
sources:  Monte— Carlo  variance,. . . 

W^:  bounded,  h(v)  =  0  for  \v\  >  ciy 
sources:  incomplete  iteration,. . . 


□  Complexity  conditions: 

1.  p  €  SI  — »■  Rp  expensive  (time,  cost) 

2.  no  economies  of  scale  for  Rp^,Rp^, . . . 

3.  limited  regularity  (information)  for  <S(p) 

4.  noise  estimable  and  controllable 
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Simulation-Based  Optimization 


□  Exact  design  problem: 

p*(A)  =  arg  min  |5(p)  -  A| 

pen 

□  Direct  Insertion  approach: 

Pj;j(A)  =  arg  rnin  |3?i?p  -  A| 

NLP(A^°)  (simulation  as  function  call) 

•  robustness  ? 

•  premature  termination  ? 

•  mid-process  flexibility  . .)  ? 

•  incorporation  of  prior  information  ? 

•  design  interactivity  ? 
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□  Surrogate  approach: 


1)  surrogate  construction  and  validation: 

prior  information 
yv(0 

p  G  Q  ^  Hp  =  5(p)  +  Wp 
construct  -Ij.  validate 
M  :  5(p)  «  >S(p)  over  Q 

2)  surrogate-based  optimization: 

p*(A)  =  arg  min  |5(p)  -  A|:  NLP(A^) 

•  robust,  complete  optimization 

•  significant  mid-process  flexibility  (A) 

•  prior  information  readily  integrated 

•  highly  interactive. . .  BUT  purposiveness 

How  does  <S(p)(^  i?p)  ^  5(p)  affect 
predictability:  4S(p  p*)  ?  necessary 
optimality:  p* vs  p*,<S(p*)  vs  5(p*)  ?  desirable 
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□  Simple  example: 

•  flow  of  water  past  sphere 

•  incompressible  Navier— Stokes 
p  =  log  (Reynolds)  €  Q  =  [1,3]  c 

s=  drag  coefficient ,  C£);  <S(p)  =  C£)(log  i?e) 

•  optimization  problem 

A  =  target  drag  coefficient,  cd 
OogRe)*  =  argminiogfleer2|CD(logi?e)-C£) 

•  =  YOUR  CODE  HERE 

^log  Re  =  CoOog  Re)  +  0) 

o  Direct  Insertion  approach: 

(log  Re)%=  arg  l-Riog  Re  “  col 

•  M:  5(p)  =  C£)(log  i?e) 

o  Surrogate  approach: 

(log  J?e)*  =  arg  min  \C ]^(\og  Re)  -  cn 

log  ReeQ. 
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BAYESIAN-VALIDATED  SURROGATES 


{Pen} 


train 


~'{M‘ 


{(P,Rp  =  S(P)  +  lVp)} 
test 


2,  <£^1  or  77 


>  a  ^ 


A,J 


r-*|5(p)  -5(P)|  < 

U^,  I —  over  fraction  of  Q  > 

@1 —  with  probability  > 

1  r  (L = !)■ 

,  ^VAC(U,l-^l,l-e2)  - 1 


surrogate- based  optimization:  P*(A) 


a  posteriori  error  analysis: 
predictability  (general) 
optimality  (quasi-convex) 


A(-,-) 
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VN  or  VC 


□  Attributes  (vs  alternative  schemes): 


+  rigorous  bounds  (vs  plausible) 

+  fixed  validation  sample  size 
(vs  asymptotic  results) 

+  verifiable  hypotheses  on  <S(p) 

(vs  convenient  assumptions) 

+  confidence  Interval  (vs  expectation) . 

+  nonparametric  frameworR  • 

(vs  p*  ~  p  conditional) 

+  purposive  estimates  (vs  suggestive) 

+  elemental,  sequential,  adaptive 

(vs  global,  batch,  open— loop) 

±  worst— case  analysis  (vs  average— case) 
±  limited  resampling  (vs  "1-left— out" ) 

—  volumetric 

poor  coordinate  localization  in 
unless  highly  correlated  inputs  (shape) 
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□  Contributing  “technologies”; 


o  BLUP  computer-simulation  surrogates 

[Sacks,  Morris,. . .  ] 

o  VAC  framework  [Valiant,...] 
o  Information-based  complexity  theory 

[Traub,  Wozniakowski,. . .] 

o  theory  [Niederreitter,. . .] 

o  Stochastic  optimization  [Rubinstein,...] 

o  Statistical  prediction: 

•  experimental  response  surfaces  [Box,...] 

•  train— test  procedures  [Weiss  &;  Kuiikowski,. . .] 

•  tolerance  limits,  order  statistics  [David,...] 

•  nonparametric  nonlinear  regression  [Hardie,. . .] 

•  hypothesis-testing  [Pratts  Gibbons,...] 

•  cross-validation  techniques  [stone,...] 

o  System  identification  [Bohlin,...] 
o  Optimization  models  [Barthelemy,  Haftka,...] 


10 


C-334 


APPLICATION:  Effective  Conductivity 


□  isotropic,  fibrous  composite 

h -  0  - - 


insulating  inclusions 
(concentration  <^y 


periodic  supercell 


continuous  phase  . 
(unit  conductivity) 


X 


(2) 


XV 

Z 


e 


X 


(1) 


conduction  heat  transfer/depth  across  plane 
imposed  temperature  gradient  x  6 


ke  —  ^e(0)  —  Jim  <  K  —  ^5  X^)  RSA 

6-^00  — 

P  =  (?!>  e  Q  =  [0.05,  .5]  C  JR^=1 

5  =  ke 

5(p)  =  ICei4>) 


□  (simulation) 

o  Numerical  approach 

•  Monte-Carlo:  <  K  'K^  =  -fs'i 

lyj* 

•  FEM:  V\=  ^  Ki  =  K:ri4>, do, y^) 

variational-bound  nip  treatment 
automatic  data  parallel  partition 
automatic  parallel  mesh  generation  (Hecht) 
IP2  isoparametric  discretization 

parallel  conjugate  gradient  iteration 
iPSC/860  hypercube  implementation 

o  Computational  requirements: 

•  4>  =  0.50,  Nr  =  20-^ 

iPSC/860  (16  nodes):  47  mins,  $10 

•  parallel  advantage: 


time 

cost 

workstation 

12x 

— 

vector  super 

— 

lOx 

•  parallel  =>  time  cost  i  OR  problem  size  t 
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[{4>‘^^,K^d,j  =  l,...,N^  =  12],l=l,2,L  =  3 

K^e  =  }Ce(4>p  + 

J  J 


o  Surrogate  (train): 

^=1,  Ke{<p)\n^  =  H{4>) 

•^  =  2,  3,  /Ce(0)|Qf  = 

BLUE2[(0j-\:^^^-l),j  =  1,  .  .  .  ,;V^] 
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o  Model  prediction  error  estimators  (test); 

\K^,  -  ^  =  1,2,3 

=  {.016,  .043,  .034}  (•  •  •  {.016,  .006,  .019}) 

o  VAC  statement: 

With  confidence  >  1  —  £2  =  -85, 

V^€  {1,2,3}, 

3V^  C  with  \  V^)  <  £1  =  .15, 

such  that,  'icf)  g  r^. 

For  V  <zCl, 

L  >  l,p^(p)  uniform:  mp(T))  =  /^^dp/ /^dp 
L  =  l,p(p)  general:  mp{V)  =  /25p(p)dp 


pi 


=  mp(oi\ri) 
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□  Surrogate-based  optimization: 


<()*  =  arg  min  \lCe{4>)  -  A| 

X  =  .5^^*  =  .29:  k*  =  ^e(0*)  =  A 

o  A  posteriori  error  analysis  (VAf):  77  =  .20 

o  Prediction  A/'eighborhood  =  [.24,  .33] 

arg  min  max  A(<7!)',  0*)  (Euclidean) 

V'cn^  s.t.mp{V')=r) 

o  With  confidence  >  1  -  £2  =  -85, 

3  E  c  V\,  with  >  1  _  =  .25, 

such  that,  Vc;)  6  E, 

|/Ce(0)  -  A|  <  .04(C/2)  +  .07(,5)  =  .12(ca)  . 

o  Joint  statement  for  multiple  studies 

q  =  Ij .  .  . ,  Q , 

With  confidence  >  1  -  £2  =  .85, 

and  ^[Q] 
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□  Design  scenario(s) 


o  Preliminary  design  process  (potentialities) 

g  =  1, . . . ,  Q, 

0*  [9] ,  fc*  ^  ^  ^ [g]  '•es'^rces  £5  [g]  *  [g]  ^  [g] ] 

assume:  Vg  €  {1, ...  .Ql.Vcji)  € 

acceptable 

(design-performance  “volume") 

o  Final  design  process  (realization) 

SELECT  Df  =  DM[^»W^fc*W] 

CALCULATE  =  ;Ce(^*M)  (K~,[r]) 

IF  \k^  -  aM|  <  tr-W,  PROCEED 
ELSE 

FIND  $*  €  such  that 

\)Ce($*)  -  AM|  <  roW 

SET  D'f  =  D^'^^[$*,ICe($*)],  PROCEED 

o  L/nvalidated  alternative: 

preliminary  considerations,  investments  ? 
interactive  ? 


APPLICATION:  (Stokes)  Drag  on  Body 


□ 


fore  fa  ft  symmetric 

smooth,  star-shaped 


=  1  cm/s 

6(0) 

Ao 

M  (oil) 

bid)  €  W. 

<  1 

A/— *00 

axisymmetric 


6(0:  a)  =  ^  aiQ(e) 

i=l 

p  =  a  G  Q  c  s  =  /£)  (drag  force) 

<S(p)  =  -Fd(6):  .S(p)  =  F£)(b)  =  67r/itioor(b) 

□  Exact  design  problem:  A=  304  dynes 
6*(0)  =  arg  mini,(0)ew+,6(0)<cW  \^D(b)  -  A| 

2  (cm)  - , 


6(0) 

/  _  c(e) 
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2  (cm) 


□  m9  (simulation):  negligible  noise 

•  FEM  IP2  —  JP1  isoparametric  discretization 

•  Uzawa  nested  conjugate  gradient  iteration 

□  Importance  function  p(A): 

(no  b(0)  <  C(0)  restriction) 

o  Random  shape  process  B(d]  A) /r(B) 

^  probability  density  p(A) 

•  radial  scale:  equivalence  relation 

•  azimuthal  scale: 

E(B(e^- A)B(e2\  A))/f2(B)  =  g(ei,e2\ ^b,  ^b) 
o  Test  sample:  6(0;  aj)/r(6(0;  a^)),  j  =  1, . . . 


□  Validation:  rj  =  .18, £2  =  -18 
•  model  prediction  error  estimator(e): 


max 

j€{l,...,iV=30} 


F'dV^j) 

^Uoorjbj) 

Stt 


•  VAC  statement:  (si  =  .06) 
With  confidence  >  1  —  £2  =  -82, 


mp{a  e  Cl 


jiUoor{h) 


—  67r 


Gtt 


<  Al]  >  1-si  =  .94 


□  Surrogate— based  optimization: 

b*(e)  =  arg  ^'^^b(e)ew^X0)<c(e)  \^dW  -  M 
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□  A  posteriori  error  analysis  (VC): 
o  Draw  (m  =  1)  ’Proximal  Candidate(s) 


a(b(e))  =  arg  min  A(a'^^,b*(6)) 

a' sX.  a'h/r{b)<C(9)  ^(b) 

vl  = 

arg  min  max  A("— \b*(0)) 

V'cnsx.  mp(V')=v  a'€V'  f(b(0\a')) 


A*(€  VD  ~  j:p(A)\^v  ,  B*(e)  =  oc(B)- 

/  A  r 


B(e,A*) 


r(B(e,A*)) 


A(6i,62)  =  voI(xor(Bi,B2))  /  voi(B2) 


o  With  confidence  >  1  -  £2  =  .82, 

I  Fd(B*)  -  a  I  /  a  <  C/'  =  .22  [.06]  : 

Joint  for  different  A,  C,  A(-,-)  (Haussdorf. . . ). 


Manufacturing  Simulation 


Dr.  Kurt  Fickle 

U.S.  Army  Research  Laboratory,  Aberdeen  Proving  Ground 
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vehicles  and  thin-skin  composites  for  aircraft  bodies. 
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Manufacturing  simulation  is  a  means  to  leverage  market 
forces. 
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"Design  the  machine 
which  makes  the  widget 


Why  Not  Indus try/Acade 
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•  Specifications  affect  cost,  so  the  user  must  be  i 


Leveraging  Simulation 


Interactive  Collaborative  Environments 


o  M 

I  i 

B  ^ 


Why  Composites? 


•  Army  has  long  history  in  low-cost  applications. 


Example  Army  Applications 
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AH-64D  Apache  Longbow 


Resin  Transfer  Molding 


Resin  injection/Curing  Part  removal 


Darcy’s  Law  Flow 


+  Cure  Kinetics 


Designed  Resin 
Flow  Front 


Impacts  affordability! 


MATERIALS/PROCESSING 


Data  for  Modeling  Materials-Processing 
Plasmas:  The  Impact  of  Parallel  Computers 


Dr.  B.  Vincent  McKoy 


California  Institute  of  Technology 


Data  for  Modelling  Materials-Processing  Plasmas: 
The  Impact  of  Parallel  Computers 


Coworkers:  C.  Winstead  and  H.  Pritchard 


Support:  National  Science  Foundation 

Air  Force  Office  of  Scientific  Research 
Sematech,  Inc. 


Vincent  McKoy 
Phone:  818-395-6545 
Fax:  818-568-8824 
E-mail:  B  VM @  start asel .  caltech.  edu 


•  Schematic  illustration  of  the  essential  role 
played  by  electron  impact  dissociation  in 
these  low-temperature  non-equilibrium 
plasmas: 


•  In  these  plasma  reactors,  electrons 
acquire  temperatures  of  hundreds  of 
thousands  of  degrees  Kelvin  while  the 
heavy  particles  have  temperatures  of 
hundreds  of  degrees  Kelvin. 


•  Electron  impact  dissociation  of  feed 
gases  leads  to  the  produetion  of  reactive 
fragments. 


•  These  reactive  fragments  are  responsible 
for  much  of  the  chemistry  brought  about 
by  these  plasmas. 
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FEED  GAS 
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reactive"  fragments  etch  surface 


Processing  of  materials  by  low-temperature 
plasmas  is  one  of  the  most  widely  applied 
high-value  manufacturing  processes  in 
United  States  industries: 


•  Fabrication  of  semiconductor  integrated 
circuits  and  other  electronic  devices. 

•  Hardening  of  tools,  dies  and  industrial 
metals. 

•  Anticorrosion  and  other  coatings 
deposited  on  surfaces. 

•  Lighting  and  displays. 

•  Hazardous  waste  remediation. 


♦  Plasma  reactors  and  processes  in  use  today 
have  been  developed  mainly  on  the  basis  of 
empiricism  and  statistical  optimization. 

More  rational  design  procedures  are  needed 
to  meet  future  needs. 

•  The  evolution  of  simulation  tools  for  plasma 
processes  will  depend  on  progress  in  plasma 
modeling  techniques  and  on  the  enhancement 
of  the  collision  cross  section  data  base 
needed  for  calculating  plasma  properties. 
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•  Because  of  the  fragmentary  state  of  the  cross 
section  data  base,  the  hazardous  nature  of 
several  feed  gases,  and  the  difficulty  of 
measurements,  particularly  for  molecular 
fragments,  computational  approaches  to  the 
generation  of  these  cross  sections  have  the 
potential  to  make  a  significant  contribution. 


•  Our  objective  is  to  exploit  the  high- 

performance  and  cost-effective  computing 
provided  by  parallel  computers,  along  with  a 
Judicious  choice  of  measurements,  to  obtain 
the  cross  sections  for  electron-molecule 
collisions  needed  for  robust  modelling  of 
these  plasmas. 
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•  A  tragic  example  of  the  hazardous  nature  of 
feed  gases  of  interest  in  plasma  etching. 
Taken  from  the  recent  article  by  M.  A. 
Dillon  et  at.,  J.  Phys.  B  27,  1209  (1994). 
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Elastic  scattering  and  some  vibrational  excitation  cross 
sections  for  electron  collisions  with  Si2H6 


M  A  Dillont,  L  Bocstcnt,  H  Tanakat,  M  Kimurat  and  H  Saio§ 

t  Argonnc  National  Laboratory.  9700  S,  Cass  Avenue.  Argonne.  IL  60439.  USA 
t  Department  of  Physia,  Sophia  Univenity.  Chiyoda-ku.  Tokyo  102,  Japan 
{Department  of  InformaUon  Science.  Ochanomtzu  University.  Bunkyo-ku.  Tokyo  112 
Japan 


Vibrational  excitation 

During  Ihc  extension  of  this  work  to  the  realm  of  vibrationally  inelastic  scattering,  a 
fatal  explosion  at  Osaka  University  led  to  a  curtailment  of  research  on  silane  in  Japan. 

ew  regulations  have  made  it  impossible  to  conduct  research  using  silanes  without  very 
costly  laboratory  modifications.  These  circumstances  have  dictated  the  premature  end 
of  our  investigation,  but  nevertheless  we  have  obtained  some  results  that  are  relevant 
to  Inc  present  discussion  and  arc  worth  reporting. 


The  calculation  of  cross  sections  for 
collisions  of  low-energy  electrons  for  the 
gases  of  interest  in  these  plasmas,  e.g.,  BC4 
and  SiC^4,  is  computationally  intensive. 


In  contrast  to  conventional  supercomputers, 
the  high  speeds  and  large  memory  of  parallel 
computers  make  a  computational  approach  to 
generating  these  cross  sections  feasible. 


In  these  studies  we  use  a  multichannel 
extension  of  the  variational  principle 
originally  introduced  by  J.  Schwinger.  This 
multichannel  variational  principle  was 
specifically  formulated  for  applications  to 
electron-molecule  collisions. 


Our  variational  principle  can  be  applied  to 
both  elastic  and  electronically  inelastic 
collisions  with  general  polyatomic 
molecules.  Polarization  effects  can  be 
included  via  closed  channels. 


As  in  the  original  Schwinger  method,  the 
trial  wave  function  need  not  satisfy 
scattering  boundary  conditions  ;  square- 
integrable  functions  such  as  Cartesian 
Gaussians  may  be  employed. 


•  Application  of  this  multichannel  variational 
principle  leads  to  a  system  of  linear 
equations 

A  X  =  b 

whose  solutions  yield  the  scattering 
amplitudes. 


•  A  and  b  are  complex  matrices  with  elements 

4 = <  -P)H+VP-  yG(+)  FI  0 .  > 

and 

(rj,  Fj,  r^)exp  {ik„ 
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•  In  a  basis  of  Cartesian  Gaussian  functions,  i.e., 

NaCx-Ax/Cy-Ay)””  (z-AJ”  exp(-alr-AF), 

all  integrals  needed  in  the  construction  of  the 
matrix  elements  of  A  and  b  can  be  evaluated 

analytically,  except  for  those  of  V. 


•  The  integrals  arising  in  the  <(j).  \VG^'*’W\(l)j> 

term  have  no  known  analytic  form  and  must 
be  evaluated  by  quadrature. 


•  The  high-performance  and  cost-effective 
computing  provided  by  parallel  computers  is 
the  key  to  our  ability  to  evaluate  these 

V  integrals  efficiently  and  to 
effectively  use  this  procedure  to  study 
electron-molecule  collisions. 
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•  To  obtain  the  matrix  elements  of  V  we 
must  evaluate  and  transform  a  large  number 
(e.g,  109  -  1010)  of  two-electron  integrals  of 
the  type 

Jc?3r2a(rj)/?(r,)j-l  7(r,)exp(/k-r,) 

'^1  ^2* 

where  a,  f3  and  y  are  Cartesian  Gaussian 
functions  of  the  form 

N„(x-A,/(y-Ay)n^(z-A,)"  exp(-a  lr-A|2)_ 


•  These  integrals,  however,  can  be  evaluated 
analytically  with  a  program  containing  less 
than  two  thousand  lines  of  Fortran. 
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•  To  obtain  these  matrix  elements  we  load  the 
Fortran  code  for  evaluating  these  integrals 
on  each  of  the  microprocessors  of  the 
parallel  computer  and  deal  out  a  subset  of 
integrals  to  every  microprocessor. 


•  Transformation  of  these  integrals  stored  in 
the  microprocessors  to  generate  the  final 
matrix  elements  is  achieved  via  distributed 
multiplication  of  large  complex  matrices. 


•  Sustained  performance  of  5  GFLOP  is 
typical. 
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We  have  made  significant  progress  in  an 
ambitious  project  to  generate  cross  section 
sets  for  the  species  BC£,.BC£2,BC£,  SiC^,, 
SiC^,,  SiC^i,  and  SiC^  which  occur  in  boron 
trichloride  (BC^,)  etching  plasmas.  This 
data  will  enable  robust  simulations  of  BC4 
etching  plasmas. 


Since  only  limited  experimental  data  are 
available  for  SiC^4,  and  none  at  all  to  our 
knowledge,  for  the  remaining  molecules, 
these  computations  address  a  clear  need. 


These  studies  also  provide  a  challenging 
opportunity  to  explore  electron  collision 
processes  —  particularly  those  involving 
radicals  —  about  which  little  is  known. 


Cross  Section  (10 


Electron  Energy  (eV) 

•  Cross  section  for  electron  scattering  by  SiCl^  : 

- ,  elastic  cross  section  calculated  without 

target  polarization;  ,  measured  total  cross 
section  (H.-X.  Wan  et  ah,  J.  Chem.  Phys.  91, 
7340  (1989)). 


Cross 


Impact  Energy  (eV) 

•  Dissociation  cross  section  for  BCI3.  This  cross  section  is 
obtained  as  the  sum  of  the  802  the  6e'  — > 

3a2  and  the  2a2  —¥  electron-impact  excitation 

cross  sections. 
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Cross  Section  (10 
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•  Dissociaton  cross  section  for  BCl.  This  cross  section  in- 

•  eludes  the  ^’^A,  and  channels  arising  from 

27t  — )•  Stt  electron-impact  excitation. 
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•  This  work  is  an  early  illustration  of  the 
significant  impact  that  scalable  parallel 
computers  can  be  expected  to  have  on  our 
ability  to  model  complex  physical  and 
chemical  systems. 


•  A  proposal  to  develop  high-performance 
simulation  tools  for  the  plasma  processes 
used  in  microelectronics  fabrication  is  being 
prepared.  The  Technology  Computer  Aided 
Design  (TCAD)  group  of  Intel  (Santa  Clara) 
will  be  a  coinvestigator.  Parallel  computers 
will  play  a  central  role  in  this  effort. 


•  Such  simulation  tools  can  contribute  to  the 
development  and  early  evaluation  of  new 
equipment  for  microelectronics  fabrication 
and,  hence,  to  the  competitiveness  of  the 
industry. 
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Computer  Applications  for  Crystal  Growth 

Phenomena 


Professor  Thomas  C.  Halsey 


Exxon  Research  &  Engineering 
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Growing  Zn  electrodeposit 

(Matsushita  et  al.) 


3A 


Surface  leader  discharge 

(Nienneyer  et  ai.) 


A  crude  model  for  electrodeposition: 


•  1 )  Metallic  surface  is  an  equipotential  of  £2. 

•  2)  A  Q  =  0  in  solution,  as  there  is  no  space  charge. 

•  3)  In  solution,  current  i(x)  =  8  Q. 

f  4)  At  electrodeposit  surface,  growth  rate  is  proportional 
to  i(s)  =  dQ. 
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Diffusion-limited  aggregation 


•  The  Witten-Sander  algorithm  :  growth  without 
surface  tension. 


•  Particle  arrives  from  infinity.  When  it  strikes 
seed,  it  sticks. 

♦  Model  for  transport-controlled  growth  in 
electrodeposition,  colloidal  aggregation, 
viscous  fingering. 

0  Structures  grown  are  highly  branched, 
ramified:  "fractal" 


n  ~  r 


♦  D  <  d,  the  dimension  of  space.  Younger 
aggregates  are  more  dense. 
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Electrostatic  analogy 

Suppose  that  the  probability  density  of  a  particle 
that  has  not  yet  touched  the  cluster  surface  is 

p(x,t);  this  probability  must  be  zero  on  the  dotted 
positions. 


Elsewhere, 

a,  p(x,t)  =  A  p(x,t) 


If  we  write 
(X)  =  /  dt  p(x,t) 
then  since  p(x,0)  =  0,  and 


p(x,=o)  =  0,  we  have 


A  Q(x)  =  0 


1 


And  the  probability  G(s)  that  the  particle  first 
arrives  at  the  surface  position  s  is 


lOrj 

mode  I  ] 


G(s)  =  d„Q(s) 
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Generalization:  dielectric 


breakdown  models 


«  Call  the  local  electric  field  at  the  surface  in  the 
above  problem  P{s). 

«  Then  in  DLA,  the  growth  probability 

G(s)  =  P(s)  /  J  ds  P(s) 


•  Niemeyer  et  al.  introduced  a  set  of  models  for 
dielectric  breakdown  in  which, 


G(s)  =  P''(s)/JdsP''(s) 


•  In  this  family  of  models,  the  dimension  D  is  observed 
to  be  a  monotonically  decreasing  function  of  r|. 

In  two  dimensions,  the  lower  bound  on  D  is  D  =  1 . 
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A  model 
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Model 
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FIG.  5.  A  150  rHV)-par!ic!e  aggregare.  This  aggregate  was  srown  asm 
^ticking  pnmarniir\  Df. A  aliz^nthm,  wuh  snckini!  prohahiiitv  *  -  0  ! 
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FIG  4.  A  1  lOOCKKparticie  apgregaie.  This  aggrecaic  was  grown  using 
slicking  prohahilii\  1)1. A  alcorilhni.  with  slicking  probahilir\  /  - 


C-397 


Clcif\  Loclude 


•  Surface  tension  (curvature  c/P 
Surroce  I's  source  o'f'  particle. 

fw) 

•  Anisotrop.'c  Surroce  tension 
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Monte  Carlo  simulation  of  diffusion  controlled  colloid  growth  rates  in  two  and 
three  dimensions 

Paul  Meakin 

Centra!  Research  and  Development  Department. 

Experimental  Station,  E  1.  du  Pont  de  Semours  and  Company,  ^^ilminpon.  Delaware 

J.  M.  Deutch“ 

^  Department  of  Chemistry,  Massachusetts  Institute  of  Technology.  Cambridge.  Massachusetts  02! 


400  lattice  units 


400  LATTICE  UNITS 


400  lattice  units 


400  lattice  units 
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Frequency  dependence  of  the 
double  layer  impedance 


If  an  electric  potential  is  applied  between  a  metallic 
surface  and  an  electrolyte,  a  screening  layer  of 
charge  will  come  to  the  surface,  forming  a  double 
layer. 


The  impedance  of  this  system  should  be  that  of  a 
resistor  and  a  capacitor  in  series. 

Z(co)  =  R  +  (1/i©C) 

A  "Helmholtz"  impedance. 

But  often  one  sees  instead; 

Z(co)=  R  +  (1/{ico)'nC) 

A  "constant  phase  angle"  impedance. 


Origin  of  constant  phase  angle  impedance  lies  in 
roughness  of  surfaces; 


0  10  20  30  40  50  Sa  70  30 

DISTANCE  (um) 


This  leads  to  inhomogeneities  in  current  arriving  at 
surface  during  charging  process. 


This  process  can  be  analyzed  using  many  of  the 
same  methods  used  in  studying  diffusive  growth 
models. 
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Impedance  can  be  expressed  in  terms  of  a  set  of 
coefficients  b(n),  which  give  the  probabilities  that  a  random 
walker  starting  at  one  electrode  will  bounce  n  times  from  the 
other  electrode  before  returning 


^  Z-^(a))  =  1  -  S  b(n)  X, 

^  X=(1/1+i(o) 

Impedance  is  a  generating  function  for  bounce  probabilities  • 
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C-406 


High  Performance  Computational 
Pursuit  of  Strategic  Materials  Properties 

Dr.  Warren  E.  Pickett 


Naval  Research  Laboratory 


High  Performance  Computational  Pursuit 
of  Strategic  Materials  Properties 
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Oct  31  -  Nov  2,  1994 


DoD  High  Performance  Computing 
Modernization  Plan  Resources 

(in  use  during  FY95) 
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USAE  Waterways  Experimental  Station  (CEWES) 


Cray  C916/16512  (’'C90") 

•  512  MWord  (1  GB)  high  speed  memory 

•  1  GWord  (2  GB)  solid  state  device  (SSD)  memory 

•  16  processors 

•  16  GFLOP  peak  performance 

Cray  Y-MP8/8128  (”Y-MP") 

•  128  MWord  (256  MB)  high  speed  memory 

•  256  MWord  (512  MB)  SSD  memory 

•  8  processors 

•  2.7  GFLOP  peak  performance 
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Maui  High  Performance  Computing  Center 

Massively  parallel  IBM  SP2  machine  architecture 

400  node  complex: 

•  30  frames  of  SP2  nodes 

1  frame  =  8  ’wide’  nodes,  or 

16  ’thin’  nodes. 

•  2  switch  frames  to  connect  other  frames 

Node  =  IBM  RS/6000  Model  590  processor 

•  66  MHz 

•  266  MFLOP 

400  Processors  =»  100+  GFLOP  peak  performance 
Node  configurations  available: 


•  Thin: 

64  MB  RAM 

1  GB  disk 

180  MB  paging 

•  Wide: 

64  MB  RAM 

1  GB  disk 

180  MB  paging 

•  Wide: 

128  MB  RAM 

1  GB  disk 

244  MB  paging 

•  Wide: 

256  MB  RAM 

1  GB  disk 

308  MB  paging 

NRL  Code  6690  (Complex  Systems  Theory  Branch)  applications: 

•  first  principles  electronic  structure:  more  materials/properties 

•  parametrized  tight  binding:  surfaces/interfaces/defects 

•  quantum  Monte  Carlo:  one  "walker"  per  node 

•  clusters  &  molecules:  search  configuration  space 

•  molecular  dynamics  :  million-atom  systems 

•  quantum  many  body  theory:  under  consideration 
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Novel  Superconductors  and  Semiconductors 
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The  crystal  structure  of 
superconducting  LuNi2B2C 
and  the  related  phase  LuNiBC 

T.  Siegrist,  H.  W.  Zandbergen*,  R.  J.  Cava, 

J.  J.  KraJewskI  &  W.  F.  Peck  Jr 
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Maximum  T  vs.  Year 
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Energy  (eV) 
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Conventional,  3D,  high  DOS,  strong  coupling  superconductor 
Reminiscent  of  "old”  boride,  carbide,  nitride  materials. 


Navy  Thermoelectric  Technology:  Status  and  Potential 

•  Clean,  Quiet,  Reliable  Cooling  and  Power  Generation  Technology 
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Submarines:  Possibility  for  generating  power  from  reactor  while 
operating  silently  (no  moving  parts,  reduced  coolant  circulation?) 


Skuttenidite  Structure 


(Empty)  Perovskite  Structure 


EhJEKGY  Band  SfzucruKE 
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IrSb 


Implications  of  quasHinear  dispersion  ( >  3x1  holes/cm^): 

Quasi-Linear  .  Parabolic 

Band  Dispersion: 
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Relation  to  Experiment  IrSbg  (Slack): 

Theory  a  ■  3.45eVA  Experiment  (Hall  Number]:  ri.  •  1.1x10 
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negligible  changes  In  electrical  transport 
May  be  a  good  strategy  for  reducing  k  and  thereby  increasing  Z.  Can  it  be  made? 

•  Total  energy  calculations:  Insertion  of  Xe  is  endothermic.  AE~10  Kcal/mole 
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Sqi:ne  properties  of  semiconducting  IrSba 

Glen  A.  Slack 

CE  Research  and  Development  Center,  Schenectady,  New  York  12301 

Veneta  G.  Tsoukala 

Oxford  University,  Oxford,  England 
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FIG.  7.  The  Secbcck  coefficient  vs  temperature  for  IrSbj  and  IrojRhojSbj. 
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Theoretical  determination  that 
electrons  act  as  anions  in  the 
electride  Cs'^(15-crown-5)2  *  e~ 

David  J.  Singh Henry  Krakauert, 

Christopher  Haast  &  Warren  E.  Pickett* 

*  Complex  Systems  Theory  Branch,  Naval  Research  Laboratory, 
Washington  DC  20375-5345,  USA 

+  Department  of  Physics,  College  of  William  and  Mary,  Williamsburg, 
Virginia  23187-8795.  USA 


Electrides  are  crystalline  salts  formed  from  complexed  alkali- 
metal  cations.  There  has  been  some  dispute  as  to  whether  the 
valence  electron  from  the  alkali  ion  becomes  a  trapped  interstitial 
anion'"^  or  resides  at  or  near  the  alkali-metal  nucleus^.  If  the  for¬ 
mer  description  holds,  electrides  would  represent  stoichiometric 
counterparts  of  ionic  insulators  containing  ‘F-centre’  electronic 
defects.  Experiments''^  have  so  far  failed  to  resolve  the  question. 
Here  we  present  ab  initio  self-consistent  density-functional 
calculations'*  of  the  electron  distribution  in  the  electride  Cs*(15- 
crown-5)2-e" .  We  find  that  a  spatially  localized  electron  is  located 
at  the  anion  site,  in  accord  with  the  F-centre  model.  Although  the 
potential  is  in  fact  repulsive  in  this  region,  the  electron  is  appar¬ 
ently  forced  to  reside  here  by  the  need  to  lower  its  kinetic  energy. 
We  suggest  that  this  picture  may  hold  for  other  electrides  as  well. 
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Competitive  With  Experiment?; 
The  Future  of  Moiecuiar  Modeling 


Dr.  Douglas  Dudis 


Wright-Patterson  AFB 
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AVOID  EXCESS  TESTING  OF  CANDIDATE  MATERIALS 
CIRCUMVENT  PREMATURE  ELIMINATIONS  DUE  TO  POOR  TESTS 
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Goddard,  Caltech 


Atomistic  Modeiing 
~  1 994  Snapshot 
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With  Experiment 


NONLINEAR  OPTICAL  ORGANICS: 
PHENOMENA  TO  APPLICATIONS 


CoBversta. 


TRENDS  IN  NETWORK  SPEEDS 
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cs  and  Times  for  a  Number  of  Optical  Bistahlr 


Decreasing  Switching  Time 
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EXTENDED-HUCKEL 


There  are  Two  Kinds  of 
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Skate  to  where  the  puck  will  be. 


COMPUTATIONAL  FLUID  DYNAMICS 


Exploiting  Massive  Parallelism 
to  Simulate  Complex  Turbulent  Flow 


Professor  Paul  R.  Woodward 


University  of  Minnesota 


EXPLOITING  MASSIVE  PARALLELISM 
TO  SIMULATE  TURBULENT  FLOWS 


Paul  R.  Woodward 

University  of  Minnesota 

Army  High  Performance  Computing 
Research  Center 

November  2,  1994 

Purpose:  Exploit  massive  parallelism  in  fluid 
dynamics  applications  in  order  to  achieve: 

1)  Unprecedented  flow  accuracy, 

2)  Unprecedented  flow  complexity, 

3)  Generate  “experimental”  data  sets  of  demon¬ 
strated  accuracy  to  guide  construction  of 
theoretical  models  (e.g.  turbulence  closure), 

4)  Overcome  long  -  standing  difficulties: 

•  Accurate  high  Reynolds  number  flows, 

•  Accurate  tracking  of  multifluid  interfaces, 

•  Flow  in  or  around  complex  boundaries. 


2 


1.  Ways  in  which  massively  parallel  computing 

technology  is  changing  the  computational  fluid 

dynamics  paradigm. 

Algorithms  which  are  preferred  by  MPP’s! 

1)  Explicit  or  Iterative  implicit  methods  which 
update  cells  based  solely  on  local  data. 

2)  Regularly  structured  grids,  where  each  cell  Is 
treated  In  an  Identical  fashion  and  for  which 
there  is  no  need  for  indirect  addressing. 

3)  Capturing  schemes,  where  special  features  of 
the  flow,  such  as  shocks  or  multifluid 
Interfaces,  are  automatically  captured  and 
handled  by  the  scheme  without  special 
tracking  techniques  which  demand  a  much 
more  elaborate  treatment  for  these  special 
cells. 

n.  Piecewise  ■  Parabolic  Method 

1)  Developed  in  collaboration  with  Colella, 

Fryxell,  Edgar,  Dal,  Porter,  and  Bailey. 

2)  Time  -  dependent,  compressible  flow  with 
strong  shocks,  multiple  fluids,  general 
equations  of  state,  complex  stationary  or 
moving  boundaries,  magnetic  fields. 

To  come:  Implicit  -  explicit;  improved 
muitifluld  treatment;  quadralateral  grids. 

3)  Scientific  visualization  environment. 

4)  Fortan  -  P  precompiler. 

5)  PPMLIB  project  for  CrayT3D  & _ . 
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6)  Efficient  massively  parallel  implementations. 

•  512  -  node  CM-5:  8  Gflops  (2-D), 

•  512 -node  CM-5:  11  Gflops  (3-D), 

•  256 -node  CrayT3D:  3.75  Gflops  (3-D), 

•  8 -processor  CrayC-90:  3.5  Gflops. 

•  SGI  Challenge  Array,  16  machines, 

20  processors  per  machine  (100  Mhz), 

20  FDDI  rings:  4.9  Gflops  (32-bit  arithmetic). 

•  SGI  Power  Challenge  Server, 

16  processors  (MIPS  R-8000,  75  Mhz): 

1.55  Gflops  (32-bit  arithmetic). 

•  Cray  C-90  CPU:  450  Mflops. 

•  MIPS  R-8000,  75  Mhz:  98.5  Mflops. 

•  DEC  Alpha  workstation:  36  Mflops. 

•  HP  735  workstation:  30  Mflops. 

ill-  Operations  to  avoid,  in  priority  order: 

1)  Data  movement  from  the  memory  of  one 
processor  to  that  of  another. 

2)  Unbalanced  loads  &  idle  processors. 

3)  Conditional  execution  of  significant  code 
blocks. 

4)  Interprocessor  synchronization  events. 

5)  Indirect  addressing. 


IV.  Tools  and  tricks  we  developed  for  SIMP  (CM5): 

1)  Restrict  ourselves  to  self-similar  algorithms, 
for  which  identical  programs  can  be  used  to 
update  either  the  whole  grid  or  any 
subdomain  of  the  same  topology. 

2)  Pass  information  between  neighboring 
processors  only  when  update  strips  of  fake 
cells  outside  the  boundaries  of  a  processor’s 
subdomain  of  the  grid. 

3)  Developed  Fortran-P  precompiler  to  translate 
our  self-similar  Fortran-77  codes  Into  efficient 
CM-Fortran  for  the  Connection  Machine. 

4)  Write  code  as  If  memory  were  Infinite,  and 
Fortran-P  precompiler  automatically 
equivalences  arrays  where  this  is  possible. 

5)  Write  vector  code  for  each  node,  using 
vector  logic  {cvmgm). 
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V.  Tricks  we  developed  for  MIMD  rSGI  Array)' 

1)  All  SIMD  tricks  listed  above. 

2)  Enhance  cache  performance  via: 

•  Reference  memory  almost  exclusively  at 
unit  stride. 

•  Interleave  primary  variables  in  memory  and 
block  these  arrays  to  increase  locality  of 
memory  references  in  local  transpose 
operations. 

•  Overcome  memory  bandwidth  limitations 
by  fitting  workspace  entirely  into  cache 
using  private  copies  of  shared  data. 

•  Equivalence  (via  Fortran-P  precompiler) 
scratch  arrays  in  workspace  so  that  it  fits 
more  easily  into  cache. 

3)  Multitasking  at  a  network  node  via: 

•  Explicitly  designate  very  large  tasks, 
generally  encompassing  many  loops  or  a 
whole  code  package. 

•  Remove  implied  barriers  at  ends  of 
multitasked  loops  and  replace  them  by 
conditional  barriers  (test  semaphores). 

•  Separate  send  from  receive. 

•  Designate  send  and  receive  as  assign¬ 
ments  to  shared  variables. 

•  Never  multitask  disk  I/O  (already  parallel). 

•  Save  a  processor  for  Unix,  I/O,  network. 
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4)  Domain  decomposition  &  Load  balancing 

over  the  Network: 

•  Domain  decomposition  generates  tasks  for 
network  nodes  which  maximize  data  reuse 
and  thus  best  overcomes  network  latency 
and  bandwidth  limitations. 

•  Domain  decomposition  balances  memory 
loads. 

•  Reduce  frequency  of  message  passing  and 
increase  latency  tolerance  by  introducing 
explicitly  dimensioned  buffer  of  fake  zones. 

•  Separate  send  and  receive, 

•  Avoid  message  copying  (put  and  get), 

•  Designate  send  and  receive  by 
assignments  to  and  from  explicitly 
dimensioned  fake  boundary  arrays. 

(These  can  be  recognized  automatically  by 
the  Fortran-P  precompiler.) 

•  Balance  loads  over  network  by  further 
decomposition  into  subdomains  and 
dispatching  of  subdomains  over  network. 
Always  send  new  data  back,  so  that  each 
node  requires  only  one  additional 
subdomain  structure  and  workspace  in 
memory,  and  message  passing  topology 
remains  simple. 


C-452 


5)  Plugging  the  numbers  in  for  PPM  and  for 
the  Siiicon  Graphics  Power  Challenge  Array: 

•  3  -  level  memory  heirarchy: 

4  MB  cache,  2  GB  shared  at  node, 

32  GB  distributed  (assumes  16  nodes). 

•  3  -  level  latency  hierarchy: 

60  nsec,  0.8  psec,  0.5  -  3  msec. 

(assumes  switches  reset  on  each  event) 

•  3  -  ievel  bandwidth  hierarchy: 

1200  MB/S,  67- 1200  MB/s,  60  - 180  MB/s. 
(assumes  3  HiPPi  interfaces  per  node) 

•  18  CPU’s  on  each  of  16  nodes. 

•  100  MB/s  disk  I/O  on  each  of  2  nodes. 

•  Typical  task  executed  by  single  CPU 
(fits  into  4  MB  cache): 

Pencii  of  4x4  strips  of  64  zones  each: 

4x4x(64+7)x1300  fiop  =  1.48  Mfiop 

— >  15.0  msec. 

((4x4  +  4x2x4)x(64+14)  +  4x4x64)x5x4  Byte 

=  93.1  KB  mem  I/O 
—>  1.32  msec. 
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•  Typical  task  executed  by  single  node 
(fits  Into  2  GB  shared  mem): 

2  time  steps  64x64x64  zone  subdomain: 

6x16x16  pencils  as  above  =  2,27  Gfiop 

— >  1.36  sec. 

2x6x(2x10  +  4x2)x(64+14fx5x4  Byte 

=  39.0  MB  message  I/O  (read+write) 

(27.9  MB  time-limiting,  1  net  link) 

32.5  msec,  (to  shared  memory) 

—>  0.464  sec.  (all  to  network) 

(64^  +  (64+1 4)^)  x5x4  Byte  data  to  &  fro 

=  14.1  MB  over  1  net  link 

— >  0.234  sec.  (for  load  balancing) 

•  Typical  task  executed  by  16-node  Array 
(fits  into  32  GB  distributed  memory): 

5000  time  steps  on  512x512x1024  arid: 

256x256x256  brick  at  each  node. 

(Memory  could  accommodate  a 
problem  twice  this  size.) 

Each  brick  consists  of  64  sub-bricks. 

16x6  sub-brick  faces  sent  over  network. 

(as  if  1/4  of  the  sub-bricks  do  netwk  I/O.) 

2500x(16x(0.464  +  1.36)  +  48x(0.0325  +  1.36)) 

=  2500  x  96.3  sec  (raw  computation) 

=  66.9  hour  =  2.79  day 
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After  every  10  time  steps,  send 
compressed  description  of  fiuid  state, 
with  2  Bytes  per  word,  to  the  2  nodes 
which  have  attached  100  MB/s  disk 
subsystems. 

500  compressed  dumps  of  2.5  GB  each. 

1.25  Tbyte  data  set  to  be  archived. 

Requires  500x2500/180  sec  =  1.93  hour. 

Hence  totai  computation  time  =  68.8  hour 

Totai  computation  =  5807  Tfiop. 

Overaii  performance  =  2323  Gfiop  /  99.1  s 
_ -  23.4  Gfiops 


This  estimate  does  not  aiiow  for 
overtapping  message  I/O  with 
computation,  but  it  also  assumes  perfect 
load  balance.  Only  17  of  the  18  CPU's 
in  each  machine  are  used  for 
computation.  The  above  figures  indicate 
that  irregular  loads  would  not  result  in  a 
significant  performance  degradation  for 
problems  of  this  size  (for  the  PPM  code). 
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VI.  Advantages  of  this  approach  for  2-D  flow 

problems: 

1)  Simulations  on  grids  of  8  million  cells  are 
practical. 

2)  On  these  fine  grids,  captured  shocks, 
contact  discontinuities,  and  multifluid 
interfaces  are  “razor  sharp.” 

3)  On  these  fine  grids,  object  boundaries  of 
complex  shape  can  be  described  accurately 
without  the  use  of  body-fitted  grids. 

4)  The  PPM  code  runs  at  8  Gflops  on  theCM-5 
for  these  problems,  and  therefore  one  may, 
in  the  supersonic  regime,  obtain  statistically 
steady  flows  from  direct  Integration  of  the 
governing  equations,  without  having  to 
develop  approximate  time-averaged  fluid 
equations.  Time  averages,  as  in  nature,  may 
be  taken  after  rather  than  before  the 
simulation  is  performed. 

5)  Because  no  body-fitted  grids  are  necessary, 
objects  of  complex  shape  may  be  moved 
through  the  grid  according  to  the  dynamic 
forces  acting  upon  them,  which  are 
computed  as  a  natural  part  of  the  PPM 
calculation.  Objects  may  even  change  shape 
if  this  Is  appropriate. 
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VII,  Examples  of  2-D  flows  computed  on  the  CM-5 

attheAHPCRC  using  the  PPM  code. 

1)  The  interaction  of  a  Mach  1.3  shock  with  a 
wedge.  Direct  comparison  with  experiment. 

2)  Statistically  steady  Mach  4  flow  about  a 
circular  cylinder.  An  exhaustive  study  of 
convergence  properties  of  the  method. 

3)  2-D  analog  of  the  sabot  discard  process. 
Simulation  of  a  dynamic  process  with 
moving  boundaries  reacting  to  computed 
pressure  forces  generated  by  the  flow.  This 
computation  can  be  seen  as  a  large  eddy 
simulation. 

VIII.  Additional  advantages  of  this  approach  in  3-D: 

1)  On  sufficiently  fine  grids,  PPM  simulations 
can  be  viewed  as  large  eddy  simulations -- 
the  largest  scales  of  turbulent  motion  are 
resolved,  while  dissipation  of  this  turbulent 
energy  Into  heat  Is  accomplished  on  scales 
which  are  unresolved  by  the  grid. 

2)  PPM  calculations  on  grids  of  up  to  a  billion 
cells  have  been  performed  on  equipment 
with  a  list  price  of  under  12  M$  (the  Silicon 
Graphics  Challenge  Array,  Sept.,  1993). 

3)  These  fine  grids  allow  us  at  last  to  get  an 
accurate  look  at  the  Kolmogorov  inertial 
range  of  homogeneous,  compressible 
turbulence  and  to  compare  Its  behavior  to 
turbulence  closure  models. 
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IX.  Examples  of  3-D  PPM  simulations. 

1)  NSF  Grand  Chaiienge  simulation  of  turbulent 
compressible  convection  in  a  stratified 
atmosphere.  PPM  parallelized  on  the  Cray- 
C90  at  the  Univ.  of  Minnesota  and  on  the 
Cray-T3D  at  the  Pittsburgh  Supercomputing 
Center.  For  the  first  time  the  Interaction  of 
the  convection  cellular  structures  with 
turbulent  fluid  motions  can  be  studied  in  a 
“first  principles”  calculation  without  recourse 
to  heuristic  turbulence  modeling. 

2)  Biiiion-Zone  Grand  Chaiienge  simulation  of 
homogeneous,  compressible  turbulence  using 
PPM  code  parallelized  on  the  320-processor, 
16-machlne  Chaiienge  Array  at  Silicon 
Graphics  in  Mt.  View,  California.  For  the  first 
time  the  detailed  structures  and  behavior  of 
compressible  turbulence  In  the  Kolmogorov 
Inertial  range  may  be  studied,  visualized,  and 
compared  In  detail  with  predictions  of 
turbulence  closure  models. 
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X.  Where  does  this  lead? 

1)  Simplified  new  approaches  to  the  simulation 
of  complex,  turbulent  flows  which  exploit  the 
fine  grids  which  MPP’s  and  tens  of  Gflops 
make  possible. 

2)  Decreased  reliance  on  empirical  engineering 
models  and  increased  use  of  direct 
simulation,  with  the  hope  of  greater 
confidence  in  the  results  when  simulating 
flows  in  unexplored  regimes. 

3)  increasingly  scalable  application  codes  which 
operate  In  a  consistent  fashion  across  wide 
performance  and  capability  ranges  starting  at 
the  desk  top. 

4)  With  the  help  of  new  tools  like  our  Fortran- 
P  precompiler  and  scalable  libraries  for  CFD 
like  PPMLIB,  a  much  reduced  barrier  to 
massively  parallel  computation. 

5)  With  the  help  of  Gigabit  networking  and 
powerful  flow  visualization  tools  and 
systems,  such  as  those  developed  in  the 
AHPCRC’s  Graphics  and  Visualization 
Laboratory,  increasingly  interactive  and 
natural  visual  analysis  of  the  extremely  large 
data  sets  which  these  simulations  produce. 
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Large  Scale  Data  Acquisition  and 
Processing  in  Turbulent  Flows 
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Aircraft  Engine  Emissions  Reduction 
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Turbulent  Shear  Flows:  Inner  and  Outer  Scales 
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Four-Dimensional  (x-y-z-t)  Turbulence  Measurements 
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Conserved  Scalar  Measurement  ^(x/)  (256^) 


Scalar  Energy  Dissipation  Field  VC(x^) 
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Scalar  Imaging  Velocimetry  (SIV) 


Solution  of  an  inverse  problem  gives  the  velocities 
u(x,t)  from  fully-resolved  measurements  of  the 
scalar  field  ^(x,t) 


Variational  formulation: 


+  u-V - 
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dt 


ReSc 


£2  =  V-u 


E,  =  Vu:Vu 


X 


Minimizing  £'2  requires 


Scalar  Imaging  Velocimetry  (SIV) 


Discretizing  in  space  puts  these  in  matrix  form 
as 


>  eiements  are 


Au  =  b 

dt  dx.  dx-dxj 


256  X  256  X  N  volumes  x  3  components  requires 
b  is  196,608  X  N  elements  long 
A  is  3.87(10^0)  X  N2  elements  large 


Direct  solutions  methods  are  out  of  the  question 

Linear  Iterative  methods  are  used 

Diagonal  dominance  requirement  restricts  (a,P) 
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•  Conserved  Scalar  Measurement  ^(x^)  (256^) 


Scalar  Energy  Dissipation  Field  VC(x/) 


Scalar  Imaging  Velocimetry  Measurements 


Scalar  Imaging  Velocimetry  Computations 


Scalar  Imaging  Velocimetry  Computations 


Dynamical  Turbulence  Quantities 
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High  Performance  CFD  Simulation 
for  Priority  Real-Time  Applications 
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High  Fidelity  CFD  Simulation  for 
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Flickering  Methane-Air  Diffusion  Flame 
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Sequence  of  lemperalure  conlours  in  lime  for  a  iliekering  melhanc-air  llame  shows  clipped  off 
portion.  Interval  between  frames  is  It)  ms. 


What  Is  the  DoD  Applications  Software  Probiem? 
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Transferring  Parallel  Programming  Technology 
Provide  A  Library  of  Program  Shells  - 
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A  Team  for  Multiasneration  Scalable  Software 
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Identified  Needs  for  Real  Time  Detailed  Simulation 
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VIRTUAL  CELL  EMBEDDING  (VCE) 

METHOD 
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VCE  Gridding 


VIRTUAL  CELL  EMBEDDING  (VCE) 

METHOD 
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VCE  Gridding  Qe||  Subdivision 


Unsteady  Flow  Calculations  for  the  Navy  DDG51 
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Temperature  •  Entire  Domain 
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Comanche 

Fuselage  with  Main  and  Tail  Flotor  Actuator  Disks 


Pressure  Contours  Flov/  Paranieters: 

Angle  of  Attack  0  deg. 

Mach  Number  0.26 

Reynolds  Number  14,000,000 

y.  A.C.B.  Dimanlig  and  E.P.N.  Duque  U.S.  Army  Aeroflightdynamics  Directorate  -  ATCOM 


Laboralory  lor  Cojiipulalional  Physics  and  Fluid  Dynamics  -  NRL  /  Code  6400 


Other  Real  Time  Simulation  Opportunities 
•  High  Resolution  Contaminant  Transport  Model 
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Applications  for  a 

High  Resolution  Contaminant  Transport  Model 
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AUTOMATIC  TARGET  RECOGNITION 


Automatic  Target  Recognition  (ATR) 
for  Wide  Area  Surveiilance 

Dr.  Jonathan  Schonfeld 


Advanced  Research  Projects  Agency 
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ATR  and  wkie  area  surveillance 
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ATR  and  wkJ«  area  survelltarKe 


The  Tier  II  Plus  Unmanned  Air  Vehicle 
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Deployments,  composition,  etc.  inciude  ground,  aviation,  naval  and  electronic  orders 
of  beltle;  entries  shown  are  for  ground  forces  oniv. 
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uses  lerrain  analysis  and  collateral  information 

SNL/ARL:  Focused  search  for  TCT’s  with  high  resolution  SAR 

-  High  pd  demonstrated  with  limited  target  set 


Sensor/ATR  Performance 
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ATR  and  wide  area  tuiveHtance 
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ATH  and  wide  area  surveillance 


Synthetic  Aperture 
Radar 
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ISAR  Target  at  Different  Angles 
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’xV  Resolution,  PWF  linages,  10°  Increments 


High-Leverage  ATR  Application  Program 
Candidates  for  Near-Term  Transition 
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ATR  and  wide-area  turvelllance 
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ATR  and  wkJa  araa  surveillance 
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ATR  and  wide  araa  aurveillancA 
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War  Braakar 


MSTAR  Technology 

(Moving  and  Stationary  Target  Acquisition  and  Recognition) 
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ATH  and  wide  area  stirveltlarKe 


Model-Driven  Reasoning 
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The  University  ATD/R  Initiative 
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After  less  than  one  year,  five  university  ATR  component  products 
evaluated  on  Lincoln  Laboratory  algorithm  testbed 

Two  outperform  existing  Government-sponsored  baseline 


Profile  of  University  ATR  Projects 
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ATR  and  wide  araa  •urvetllanca 
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ATR  «nij  wid«-«r»«  surv«IH«nc« 


Future  Data  Processing  Challenges 
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ATR  Algorithm  Development  Using  Khoros 

Professor  Robert  A.  Hummel 


New  York  University 


ATR  Algorithm  Development 
using  Khoros 

(1 )  Perspectives  on  ATR,  and  the 
concept  of  multisensor  systems 

(2)  An  ATR  Project  at  NYU 


Robert  Hummel,  NYU 


This  talk  is  concerned  with  the  use  of  simulation  and  computer  modeling  for 
ATR  applications.  We  begin  witti  some  perspectives  on  ATR,  both  the  need 
and  the  algorithmic  underpinnings,  and  then  discuss  some  of  the  methods  that 
are  being  used  in  the  NYU  ATD/R  project  funded  by  Aipa  in  the  MSTAR 
University  Research  Initiative. 

Simulation  impacts  on  ATR  in  three  ways: 

•  In  order  to  recognize  a  target,  a  system  needs  a  model  of  the  target,  and  must 
compute  a  comparison  between  die  observed  object  and  the  targets  as  predicted 
by  the  models.  The  computation  often  requires  supercomputer  potential  on  an 
embedded  system. 

*  In  developing  the  ATR  algorithms,  researchers  simulate  an  embedded 
system,  using  software  tools  that  enable  them  to  build  a  virtual  ATR  s>’stem. 
This  aspect  is  more  challenging  than  it  seems,  and  is  greatly  facilitated  by 
advanced  software  tools. 

»ATR  must  be  trained  and  tested  against  a  large  database  of  imageiy.  Since 
acquisition  of  sample  images  is  expensive,  it  is  imperitive  to  develop  test 
scenes  and  realistic  background  clutter  through  simulation  methods. 
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What  is  Khoros? 

•  A  poor-person’s  A  VS 

•  An  image  processing  visualization  system 

•  A  visual  programming  system 


Now  distributed  by  Khord  Research,  Inc 
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This  talk  is  a  call  for  advanced  software  tools  for  supporting  research  in  ATR, 
especially  in  the  aspect  of  simulating  an  embedded  system  by  a  laboratory 

system.  Khoros  is  a  visual  programming  language  for  image  processing,  # 

funded  by  Arpa,  that  supports  this  goat  and  can  be  considered  a  prototype 
model  of  software  tool  development. 

An  issue  to  consider  in  the  development  of  software  took  through  government 
funding  is  the  extent  to  which  the  publicaUy-available  software  displaces 

commercial  software  that  helps  to  drive  a  software  industry.  ® 
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ATR  Challenges 


Find  Mobile  Mssile  Launchers, 
TELS 

Expose  massed  armor  movements 
Separate  true  tai^gets  from 
confusers,  decoys,  even  when 
counter-measures  are  applied 
Target  munitions,  submunitions, 
and  provide  surveillance  aids 


Iraq  had  hundreds  of  Scuds,  shot  over  80  during 

ODS,  and  effective  opposition  capabiiities  were, 

and  remain,  minima! 
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ATR  is  experiencing  a  resurgence  of  support  and  military  interest,  in  part 
motivated  by  the  desire  to  locate  missile  launchers  from  platforms  ifying  in 
combat  air  patrol  missions. 

However,  ATR  in  general  has  many  other  potential  applications. 

To  partly  explain  the  level  of  interest,  consider  how  different  the  world  would 
be  if  the  act  of  moving  a  tank  or  a  missile  launcher,  or  firing  a  mortar, 
constituted  an  extreme^  risky  action.  In  order  to  deter  hostile  action,  one 
needs  information,  and  surveillance  and  detection  requires  analysis,  which 
needs  to  be  automated  by  ATR 
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ATR  Sensors 

•  Sensors; 

-  FLIR,  Second  Generation  andFPA’s 

-  Multispectral  E/O,  “color”  in  multiple  spectral  bands 

-  Radar:  X  band  and  MMW 

•  High-resolution  I-D,  polaiimetric 

•  Doppler  beam  sharpennedmodes 

•  SAR,  spotlight  and  strip  modes 

•  Moving  target  detection 

•  Vibrationmodes 

-  Ladar 

•  Time  of  flight  ranging 

•  Interferometric  methods  for  ranging 

-  Lidar 

•  Chemical  composition  of  effluents 
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ATR  can  make  use  of  many  different  kinds  of  sensor  inputs.  The  most 
common  sensor  systems  for  ATR  include  FLIR  and  SAR,  but  real-beam  radar, 
LADAR,  and  even  UDAR  (to,  for  example,  examine  exhaust  characteristics  of 
vehicles)  can  be  included. 

The  most  interesting  and  successful  ATR  systems  will  make  use  of  multiple 
sensor  inputs,  and  attempt  to  fuse  the  information,  either  by  running  separate 
ATR  algorithms  on  each  sensor  modality  and  combining  the  decisions,  or 
better,  by  combining  features  from  different  sensors. 

It  is  important  to  model  the  sensor  as  well  as  modeling  of  the  targets,  since 
simulation  of  the  imagery,  as  well  as  recognition  of  the  targets,  depends  on  an 
understanding  of  the  sensors. 
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ATR  Processors 

*  Processor  Power  is  no  longer  a  problem 

-  On-board  processor  capabilities  far  exceed  what  the  academic 
conununitynoimally  expects 

-  Typical  architecture:  Multiple  Sparc  SBC’s,  a  MIMD  or  pipeline 
multiprocessor,  and  a  SIMDmesh  array  computer 

•  Memory  is  no  longer  a  problem 

-  Hundreds  of  megabytes  can  be  assumed,  for  a  price 

-  Rapid  processing  reduces  memory  needs 

The  challenge  is  in 
the  “algorithms”  for 
detection  and  recognition 

Case  Study:  Aladin  processors,  0.5  Gflops 
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One  of  the  reasons  that  ATR  is  increasing^  realizable  is  that  processor  power 
is  now  commensurate  with  the  needs.  Massive  computer  power  (throughput) 
^  is  required,  but  processing  needs  are  highfy  parallel.  Further,  by  making  the 

ATR  system  hierarchical  (preceding  from  detection  to  recognition  and 
identification),  processing  power  can  be  concentrated  on  small  regions. 

An  example  program  to  develop  processors  for  ATR  application  is  the  Aladdin 
program,  funded  by  Aipa  and  administered  by  NVL-ESD.  This  program  has 

#  developed  two  different  fonn-fitted  half-gigafiop  multij>rocessors,  and 
especially  notable  for  developing  large-scale  wafer  microprocessor 
manufacturring.  The  processors  are  small  (four  inch  diameter),  and  the  biggest 
problem  is  heat  dissipation.  Increasingly,  the  recognition  processing  can  be 
combined  with  the  image  formation  (as  in  processing  raw  radar  return  as 

•  opposed  to  image  formation).  Unique  algorithmic  methods  are  as  important  as 
processor  capabilities. 
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Operational  Algorithms  in  ATR 
Systems 

•  ATR  systems  are  now  being  built! 

•  What  are  the  embedded  algorithmic  methods? 

-  Statistica]  pattern  recognition 

•  Radar  CFAR  algorithms.  Quadratic  classifiers  on  multiple  simple  features 

-  Matched  filtering 

•  Often  on  edge  im^s 

•  Carefiiily  constructed  filters,  whittenning  filters 

-  Logical  matched  filtering 

•  Same  as  matched  filtering,  but  Mstng  AND  and  OR’s  in  place  of  weighted 

•  E.g.,  Hausdorff  measure;  Hierarchical  Generalized  Hough  Transform 

Examples:  MUSTRS  and  DAMOCLES 
Comanche  has  a  planned  multiscnsor  ATR  ! 
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There  have  been  a  number  of  prototype  ATR  systems  built  and  tested.  The 
ATR  algorithms  are  still  wanting,  but  there  has  been  some  level  of 
performance.  The  algorithmic  methods  tend  to  be  simple,  and  rely  heavily  on 
matched  filtering  (convolution-based  matching)  using  edge  maps  (edges 
extracted  from  the  literal  imagery).  More  sophisticated  methods  involving 
model-based  vision  are  similar,  but  use  logical  cormectives  (and’s  and  or’s) 
instead  of  the  convolutions  in  matched  filtering,  which  means  that  FFT 
methods  cannot  be  used  for  processing. 

We  mention  the  MUSTRS  project,  which  developed  an  embedded  ATR 
system  for  the  advanced  cruise  missile  (now  cancelled),  which  developed  a 
multi-sensor  multi-look  system  that  combines  decisions  fi-om  SAR  and  FUR 
ATR’s  to  improve  ATR  performance.  The  on-board  processor  includes  a  196 
by  480  SIMD  2-D  mesh  of  processors,  as  well  as  quad  i860  boards  for  SAR 
processing. 

The  DAMOCLES  project  is  wild.  This  submunition,  being  developed  by 
Textron,  includes  an  Aladdin  processor  (using  C40’s  instead  of  CSO’s)  and 
high  throughput  to  detect  taigets  using  real-beam  radar  and  FUR.  The 
scenario  includes  launching  an  ATACM,  dropping  the  DAMOCLES 
submunition,  which  parachutes  toward  the  target,  with  a  rotating  bo<ty  used  to 
scan  the  region,  and  a  “hockey-puck”  munition  tossed  from  the  DAMOCLES 
body.  The  intended  fielded  cost  of  the  submunition  is  $25K  per  unit. 
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The  key  to  ATR  is  stable  feature 
extraction 


•  Geometric  hashing,  and  any  matching  scheme,  depends 
on  stable  features 


Features  must  be  rich,  descriptive,  and  discriminative  of 
the  object 

In  order  for  the  Bayesian  result  to  work,  features  should 
be  independent 

-  This  means  that  under  the  assumption  of  the  presence  of  a  model, 
knowledge  of  a  particular  feature  provides  no  support  for 
presence  or  lack  of  presence  of  other  features 

-  Features  for  the  model  are  already  known  to  be  present 

-  E.g.,  line  edgels  are  not  independent 

•  Bad  news  for  many  systems 

•  But  comers,  with  angle  bisectors,  are  independent 
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The  NYU  ATR  project  is  emphasizing  the  development  of  advanced  features 
for  matching  target  models  against  images.  The  matching  techniques  need  to 
be  efficient,  but  the  key  to  ATR  is  in  the  use  of  descriptive  features.  Whereas 
edges  are  currently  the  main  source  of  features,  we  believe  that  comers,  blobs, 
tricomers,  and  other  features  offer  promise  for  using  internal  detail  of  targets  in 
order  to  enhance  the  recognition  capabilities. 

In  order  to  develop  useful  feature  extractors,  software  tools  are  needed  and 
simulation  facilities  must  be  disseminated  in  order  to  develop  and  test 
different  kinds  of  ATR  algorithms.  Likewise,  databases  of  example  images, 
both  synthetic  and  real,  need  to  be  made  available. 
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Feature  operator  design  is 
facilitated  by  the  use  of  Khoros 


Comer  detection 
Edge  detection 
Circle  detection 
Multicomer  detection 
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The  remainder  of  this  presentation  demonstrates  the  use  of  the  software  system 
Khoros  for  the  development  of  feature  extraction  methods,  together  with 

preliminaiy  results  of  object  recognition.  ^ 

We  first  show  some  example  Khoros  “workspaces”  containing  multiple 
glyphs,  which  are  the  programming  modules  of  Khoros  (in  a  visual  display 
system  called  “cantata”).  We  also  show  some  example  FLIR  imagery,  and  an 
e.xample  of  simple  e.xperimental  enhancement,  facilitated  by  the  use  of  Khoros 

modules.  We  demonstrate  feature  extraction,  showing  circle  extraction,  • 

comers,  edges  (represented  by  midpoints),  and  an  image  whose  magnitude 

gives  the  curvature  magnitudes  of  isocontours,  which  should  be  useful  for 

comer  detection  that  is  independent  of  edge  extraction.  The  recognition  result 

is  based  on  using  color  EO  imageiy  (from  a  TV  camera),  and  models  of  an 

M60  tanks.  • 

Most  of  the  results  and  work  represented  by  these  examples  were  produced  by 
NYU  graduate  student  Jyhjong  Liu. 
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This  is  an  example  of  the  Cantata  workspace  with  a  number  of  glyphs, 
used  to  extract  edges.  The  glyph  at  the  left  is  an  image  input, 
and  the  other  glyphs  extract  edges,  fit  lines,  and  display  results. 
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Glyphs  are  hierarchical.  This  workspace  is  an  expansion  of  the 
vcb  glyph  from  the  previous  workspace. 


This  set  of  glyphs  is  used  to  compute  a  nonnalized  convolution.  Note  the  use  of  the  fit  in  order  to 
perform  the  convolution  in  a  reasonable  amount  of  time. 


This  is  a  first-generation  FUR  image  of  some  trucks.  Note  that  the  engine  compartments  show 
more  brightly  than  the  rest  of  the  vehicles. 
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This  is  a  mobile  missile  launcher,  viewed  wifli  a  second-generation  FUR.  Note  how  the  target  is 
washed  out  due  to  the  lack  of  temperature  variations. 
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This  view  of  a  mobile  missile  launcher,  which  is  slightly  different  than  the  previous,  has  been 
subjected  to  a  simple  enhancement  algorithm,  available  as  a  single  glyph  in  the  Khoros  module  set. 


These  are  the  edges  extracted  from  a  FUR  image  of  an  M60  tank. 


The  results  of  a  circle  detection  algorithm,  based  on  the  Hough  transform,  are  shown  here. 


The  points  marie  the  locations  of  comers,  with  the  bisectors  of  the  comer  indicated  by  short  lines, 

representing  oriented  comer  features. 


The  points  and  lines  marie  the  midpoints  and  orientations  of  straight  line  segments  fitted  to  the  edge 

detection  results. 
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The  darkness  of  the  pixels  on  the  right  represent  the  magnitude  of  the  curvature  of  the  isocontour 
passing  through  the  point  in  the  image  on  the  left. 
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This  is  a  preliminaiy  recognition  result  based  on  matching  of  edges  to  a  tank  model. 
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